“Calhoun 


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


2003-06 


Determining the importance of nationality on 
the outcome of battles using classification trees 


Cakan, Ali 


Monterey, California. Naval Postgraduate School 
http://hdl.handle.net/10945/1014 
Copyright is reserved by the copyright owner. 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 


\§ D U DL EY research materials and institutional publications created by the NPS community. 
«iis Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NNN KNOX appointed -- and published -- scholarly author. 


LIBRARY Dudley Knox Library / Naval Postgraduate School 
411 Dyer Road / 1 University Circle 


http://www.nps.edu/library Monterey, California USA 93943 


NAVAL POSTGRADUATE SCHOOL 
Monterey, California 





THESIS 


DETERMINING THE IMPORTANCE OF NATIONALITY 
ON THE OUTCOME OF BATTLES USING 
CLASSIFICATION TREES 
by 
Ali Cakan 


June 2003 


Thesis Advisor: Thomas W. Lucas 
Second Reader: Samuel E. Buttrey 





Approved for public release; distribution is unlimited 


THIS PAGE INTENTIONALLY LEFT BLANK 


REPORT DOCUMENTATION PAGE 


Public reporting burden for this collection of information is estimated to average 1 hour per response, including 
the time for reviewing instruction, searching existing data sources, gathering and maintaining the data needed, and 
completing and reviewing the collection of information. Send comments regarding this burden estimate or any 
other aspect of this collection of information, including suggestions for reducing this burden, to Washington 
headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 
1204, Arlington, VA 22202-4302, and to the Office of Management and Budget, Paperwork Reduction Project 
(0704-0188) Washington DC 20503. 


1. AGENCY USE ONLY (Leave blank) | 2. REPORT DATE J} 3. REPORT TYPE AND DATES COVERED 
June 2003 Master’s Thesis 

4. TITLE AND SUBTITLE: Determining the Importance of Nationality on the | 5. FUNDING NUMBERS 

Outcome of Battles Using Classification Trees 


6. AUTHOR(S) Ali Cakan 

7, PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING 
Naval Postgraduate School ORGANIZATION REPORT 
Monterey, CA 93943-5000 NUMBER 


9. SPONSORING /MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSORING/MONITORING 
N/A AGENCY REPORT NUMBER 


11. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the official 
policy or position of the Department of Defense or the U.S. Government. 


12a. DISTRIBUTION / AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE 
Approved for public release; distribution is unlimited 


13. ABSTRACT (maximum 200 words) 


Throughout history people have searched for a means of predicting the outcomes of battles. Data 
analysis is a way of understanding the factors associated with battle outcomes. There are objective factors, such as 
force ratio, and subjective factors, such as leadership, that affect battles. Subjective factors are hard to determine 
and thus are usually avoided in models. Here, nationality is investigated as a surrogate for subjective factors. That 
is, we want to see how nationality is associated with battle outcomes by exploring the best available data set on 
historical land combat—developed by the Center for Army Analysis. We focus on four countries for which there 
is sufficient data: the USA, Germany, Britain and Israel. We find that these countries historically use a substantial 
amount of military power to defeat their enemies. In particular, the USA often has overwhelming force. Using 
classification tree models, with a correct classification rate of 79 percent, the results suggest that nationality was 
the most important factor in battles before World War I and the second most important factor during the World 
Wars. Force ratio was the most important factor in WWI and artillery ratio in WWII. In the years following 
WWII, the dominant variable has been air force ratio. 


14. SUBJECT TERMS Battle Outcomes, Force Ratios, Leadership, Nationality, Historical Combat, | 15. NUMBER OF 
Air Force Ratio, WWI, WWII PAGES 
97 


16. PRICE CODE 


17. SECURITY 18. SECURITY 19. SECURITY 20. LIMITATION 
CLASSIFICATION OF CLASSIFICATION OF THIS CLASSIFICATION OF OF ABSTRACT 
REPORT PAGE ABSTRACT 

Unclassified Unclassified Unclassified UL 


NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 
Prescribed by ANSI Std. 239-18 





THIS PAGE INTENTIONALLY LEFT BLANK 


il 


Approved for public release; distribution is unlimited 


DETERMINING THE IMPORTANCE OF NATIONALITY ON THE OUTCOME 
OF BATTLES USING CLASSIFICATION TREES 


Ali Cakan 
First Lieutenant, Turkish Army 
B.S., Turkish Military Academy, 1998 


Submitted in partial fulfillment of the 
requirements for the degree of 


MASTER OF SCIENCE IN OPERATIONS ANALYSIS 


from the 


NAVAL POSTGRADUATE SCHOOL 


June 2003 
Author: Ali Cakan 
Approved by: Thomas W. Lucas 
Thesis Advisor 


Samuel E. Buttrey 
Second Reader 


Jim Eagle 
Chairman, Department of Operations Research 


ili 


THIS PAGE INTENTIONALLY LEFT BLANK 


iv 


ABSTRACT 


Throughout history people have searched for a means of predicting the outcomes 
of battles. Data analysis is a way of understanding the factors associated with battle 
outcomes. There are objective factors, such as force ratio, and subjective factors, such as 
leadership, that affect battles. Subjective factors are hard to determine and thus are 
usually avoided in models. Here, nationality is investigated as a surrogate for subjective 
factors. That is, we want to see how nationality is associated with battle outcomes by 
exploring the best available data set on historical land combat—developed by the Center 
for Army Analysis. We focus on four countries for which there is sufficient data: the 
USA, Germany, Britain and Israel. We find that these countries historically use a 
substantial amount of military power to defeat their enemies. In particular, the USA 
often has overwhelming force. Using classification tree models, with a correct 
classification rate of 79 percent, the results suggest that nationality was the most 
important factor in battles before World War I and the second most important factor 
during the World Wars. Force ratio was the most important factor in WWI and artillery 
ratio in WWII. In the years following WWII, the dominant variable has been air force 


ratio. 


THIS PAGE INTENTIONALLY LEFT BLANK 


vi 


TABLE OF CONTENTS 





I EN PRODUC ION sivssscsessicsvacussvascsaveschasesvacssesanceconnsvucssseeencechqvesoesenstudcesssvucedenvecseonetvendes 1 
A. IN TROD WG ELON cacesecce sucks coaedeacncketaneeatenacuessuotacaveduancssccceceutestensceeesacoutetaonsons 1 

B. BACK GROUND weiss hascascasisassesidaseucsnonccbnsesencinessoansieponceas oasdooths doabsdvassoucadadceansbes 3 

1. Prevor Dupuy sivasesvosieecsissstdeieasacaleassaiscousaisacuacersdeluasisadeacesabiccasiassetsavisde 3 

2. Dean Hartley sci siceectsissclnasettasacncrn ss ceeasieane athe nitdic at athe 4 

3. BAP VOU go cacdccsvaveby cdavisdvcsuedobcsdacelacapuasteoiacsceccvace basis eee aiee oiadesecaess 5 

4. IVE UU ETO CONAN seis vocsed csceesCenceaienaadesannsexcavedeacesensesseuncahenssenedvousecaseeten 5 

II. SS UIVIIVEA RYE SA TES PICS aseeccy zacaceecesSeateneaciachcey ccheatancceedhtieadaceetinaneammcah tame tacentanes 9 
A. MN TROD UOC TION 6. scscccciscncetacscieketaveatecticstetisincasiteincancetincacecsaamevcagaveetesssueussngaes 9 

B. DESCRIPTIVE STAT IST CS wessccccsesdesssncsscvsestsnsoshinsbevensnsssnonssssseosensssevnccsies 12 

1. Treatment Of the Da tar sissccscsccccsesessvcesssesencesesnsvennnssnnsesynetsssnaesverestevess 12 

2. Response Varia le siacsccciiissscescissvcdadecaceescdssssdacecasvoadeentsdaloestascedenenensece 13 

a. Battle Qiutcomes “WINA” sascissssssessesvesrdsvanisvesannssensvanveatosevcceves 13 

3 ODECHVE VAPADles ..csesicseisscvees sevctssecesevsntnsseacessvoseavonsdsaconasenndesenestdters 14 

a FORCE RONG... ox crease caida ice aera 14 

b AL ery ROHS ALIN iascsiians usnunnsssasysudexsvivn choot csunsncaenseeveedees 17 

c Close Air Support Ratio: “fly”’.......sscccsssccsssssssssccssscesssscesesscees 19 

d. TORK RGHO! “MAAR siieaidihoinkeeles oil esis 21 

e Cavalry RUG? CAV.” sisciscessastesssvessssvenssvivesvassosiasbovesncecpeosach 24 

4. FROLAG VG: VAP LaDlOS arciissesisecssbecctesscessagancestessoennaseaveaecunseapeccsincnetcacaenveaes 25 

a. Relative SUPPUSE? SSURPA™ siccseceshsvsnslvsavecsesssvacivsstovdeeveotees 27 

b. Relative Initiative Advantage: “INITA”? ........sssccsssccsssssssesscees 27 

C. GENERAL DISCUSSION ON RELATIVE VARIABLES .............cescceseee 32 

D. SS UIIVENEA RY «ices caccssenseSuwcaesntidesSucesacadensieeaabegeccdetasdasesussaniacene deaths casbeaeasbuceussbesacts 33 

TH. “CLASSIFICATION TREES s.scsisccsscoonscvs ccoveasorevnesoovensseodecovacvecensseveneesinnperensiensonnesesee 35 
A. PINT RODU CLIO N wesc csccesrvecseigusiccecouwedcbssuececvessectensodsvacvessvctehedasceseucveandstecaveers 35 

B. TRECE MODES sccccteicaceiecncdeansecinestecancoavsancpodesessawevesaacesenascevarectanavoenasteavedesiee 40 

1. Model 1: The Battles Prior to World War L...............cccscccsssssseceseee 40 

2. Model 2: The Battles of World War I .............ccssccsssscesssscesssscesseceees 40 

3. Model 3: The Battles of World War I...............sccssssccssssccssscssssscseees 41 

4. Model 4: The Battles that Israel Fought .................ccsssssccssssssrcccssees 41 

C. SUMMARY ssvicsscss scsesseset odes ssevsseathenss senasvevanescdensosecv sdesdeusdeusacvucoasestessosnessducdoutss 46 

EVi “CONCLUSION cussiccocesenssseushssctcandetassadesdentecendaccangusbeupeiancactbssantanuetiaabsnpssecassbuceabacssens 49 
A. FURTHER STUDY SUGGESTIONS .0...........cccsccsssscssssccssssccsssscssssscssssssees 50 
APPENDIX A. TABLES OF RELATIVE VARIABLES ......0.......ccsssccssssccssseccsssscsssescess 51 
A. SUR PAL” spsocasscnesdestngaescdesssuvssuivcnas sneduceseessucdsseav sessssonsbdussdvacsssetedsoseandensesense 51 

B. TOT AN assis sss Weicaba ablevda buna ca pbc uacsissveassh eo sabes vasneovekes suseueasppasneaatWaveuspevkkactngs 52 

C. SEA OROA® viccctssuesccins ssciedeastdacahasesacusennsscubaa subchetsbbacsbegssensdeunaseusassscvdedatiacseucsness 53 

D. PERUAUWA sapicaheseectaacehucibinabcecastebbeasateseconuadsecscdnosonasdteateusie’sesenscspecasasoncsansacoseues 55 


E PE IRINGA > sas ccicasssnastevesuied acdotueeudactutespnavasesuvedeadetesweeeddsesuseuasucdsuagoetuesetvesdeasoctes 56 
F PIVUORRA IA? asi vosstuseads Sasasussatatieudadnssenestdut sata suns uosicdoneddaseasiass sea nddasuascwosdsesgeaeusee 57 
G SEN OOGSA. ies ckasentcccbiadccuteceasdscatucstatatesbiscesadscatan tacdnaaliaSbatauetoceesigceutestasceeltvseedss 58 
H PIVECUIEIN DA. aicsupesnstoseus insu tc tua casdbeaseutiuneuondcsdaaetasaiasoaddudsubsousscbeacsucapetseauseeatiots 59 
I POPIN ICIS AA asus ceevecit sth becesactesesacathccas coeesatcschbacuscetucstascshccsnceeedenestsadvaceteaetnseblevsece 60 
J UPI C TINA iscassicsdisecdecscccencatdveddadavssotatdessddedsasdusceddvidioceevsuiestessddss sadestuctessanoseest 61 
K FOUN EA siswaitcssnce satel decisicdae de sannslaceesies oa iedsicetsieiza dude vuduosootaa cesgueuseeus cues cansandsesoveeds 62 
APPENDIX B. BOXPLOTS OF OBJECTIVE VARIABLES ............cscsssssssssssssssessesseees 63 
A. FORCE: RAT UO svciesccsiestsectouseacdactss cess ctaacosedcistetcsuustsiscvesesspeaseceiaevahdetatiabeadtess 63 
B. ART TEER Y RATIO is ciccccscseescchecetcecinccsbeiosconcecssutekcctscsteaastachbacvedonnedvaasenccsece 66 
C. AIR FORCE RATIO \sssssisscccscacsssosscpactass tictetasctectebtedstedescpesdcbecdedecseasetestnedsvaeea 69 
APPENDIX C2 ACRONYMS  csictestscincssecsieiueesdsucssaceaitovesduncstncaenscsbedausubtecheteabascuceteussnevebausice 71 
LIST OF REFERENCES scsis saesiesesecistscicosiessateassuiscseicscsseidessboouseonseeaiesdscsueeenstoidbssevetseudacosiéens 73 
TINTETAL, DIST REBU TION. DUS V iccccccsscistesiadecscsdasncdevsicsnddessicteedsadecdevsescdsubeddaededsstascevtadecdeve 75 


Vill 


Figure 1. 
Figure 2. 
Figure 3. 
Figure 4. 
Figure 5. 
Figure 6. 
Figure 7. 
Figure 8. 


Figure 9. 


Figure 10. 
Figure 11. 
Figure 12. 
Figure 13. 
Figure 14. 
Figure 15. 


LIST OF FIGURES 


Tree Model for the Battles Before WWI. .............ccccccccccccessesssseceeeceeeesenssees XVIii 
Proportion of Battles Won By Attacker... cceecceesecceceeeeeceeeeeesteeeeneeeenaeeees 14 
Force Ratios of Attacking Countries............ccesccecssccecssececssececeseeeceeeesseeeenaeeees 15 
Artillery Ratio; Entire. DataSets. aaicoc alana idle aa paired: 18 
Air Force Ratio, All Dataset. vo... cicccecececccccccccessssscsccccsesesseessceccecesssusenegss 20 
Tank Ratio, All Battles. oo... ccceecsscscccccceessssssesccsscseesseesesccsscseeeeeeeess 21 
Tank Ratio, Israel, Germany and Britain... eee eeeceeeeececeneeeeeteeeenteeeenaeeees 22 
Ratios of the Objective Variables of Battles in WW2. The USA is the 

Attacker and. Winners <iisiiiss ascetic ecceioeeisces etude ieee eet cab OR sola eS 23 
Ratios of the Objective Variables of Battles in WW2. The USA is the 

Attacker and Hosersc. cissss cssssietcnisthvetnessancncsignsseavtonsquadias Sesesevesiotiaieeeseiodsuestens 23 
OBEN |g (A oc (0 (6 Sane ene en EO RN MEIN RTE Re A 24 
Tree Model of the Entire Data Set. ..........c cc cccccsesessecececeeeceessnseceeeeeeeeeesensaaees 37 
Model 1 Battles Before World War Lo... cccccceccccececeeessessnseceeeeceeesesensnaees 42 
Model 2 Battles of World War I. .........cccccccccccccccsssessssecececececeesenseceeesecsesesensaaees 43 
Model 3, Battles of World War U1. uo... ccccceseeeeseccccccesesseseescvesseseueeneess 44 
Model: 4: Battlesthiat Israel Fou cit ic a3 caput cy siadaiandleteennu daca vase vas decge <odennesanvnens 45 


1X 


THIS PAGE INTENTIONALLY LEFT BLANK 


Table 1. 
Table 2. 
Table 3. 
Table 4. 
Table 5. 
Table 6. 
Table 7. 
Table 8. 
Table 9. 
Table 10. 
Table 11. 


LIST OF TABLES 


Battles Per Period, Attacker. .......0...ccccccccccccssssssssssssssssssssssssssssssssssssssesssssesseeeees 10 
Battles Per Period, Defender.........ccccc cc ccceeccccccccccccsssesssceccccccsesssusesccscsesseneneess 10 
Battles Per Period, Attacker. .....0...ccccccccccccesssscsccccccceessssscsccsecceeesessesccescseeeeeeeess 11 
Battles Per Period, Defender. ............cccccccccccccccccccecececececececececececesecesecesesesueueass 11 
PORCE RALIO A VSTASE Sagi ys eaniect oie cad duade cue odgos dts ewe av e-tu ca dentee coaveastaceucesnceuseceee 16 
Surprise Advantage Attacker WIG. ..........:::ccssccssccsceseceessececsscceesscceesscceeeeeesnes 27 
Relative Initiative Advantage, Attacker WiIMS. ............c:ccescceesseceesteceeeteeeeneeees 27 
Ratio of All Relative Variables..............ccccccccccccccsssessssecececeeeceessnseceeeseceeseeensnaees 29 
All Relative Variables with the Number of Battles. ..0...........ccccccccccccceeeeeesereees 30 
Number of Missing Valves. .viiiteccssssseesscdvssscacenssacsanaseesactesy acces sdecsanddooraccasncceuns 38 
Misclassification Rates of the Trees with and without Nationality. ................ 46 


Xi 


THIS PAGE INTENTIONALLY LEFT BLANK 


Xii 


ACKNOWLEDGMENTS 


I am grateful to my beloved country for giving me everything that I have. 


Xlll 


THIS PAGE INTENTIONALLY LEFT BLANK 


XIV 


EXECUTIVE SUMMARY 


Throughout history, predicting the outcome of a battle before it starts has been a 
main concern of soldiers, historians and analysts. Different tools have been used to make 
predictions. Two of the most important and commonly used are simulation and data 


analysis. 


Built on mathematical models, such as Lanchester equations, simulations, 
especially with the advancements in computer technology, are becoming increasingly 
important. Recent developments in computer technologies and new algorithms have 
made simulations very capable and reliable, but there still are pitfalls. For example, it is 
difficult to model intangibles such as leadership and training, and these factors can be just 


as important as a soldier’s weapon. 


Another tool is data analysis. It is widely used and has been producing quite 
satisfactory results. Moreover, unlike simulations, it is possible to use intangibles in data 
analysis models. In this work, we use data analysis. Our interest area is nationality 
factors. In other words, do different nations have different characteristics that affect the 
outcome of a battle? If there are nationality factors, what are they? Do they change over 


time? Can we use them to predict the outcome of a potential battle? 


In our analyses, we used the CDBG90 data set, developed for the Center for Army 
Analysis (CAA). This is the best data set available on historical land combat. This data 
set was first prepared by the Historical Evaluation and Research Organization (HERO) in 
1983, and we are using the version with the latest updates. The CDBG90 includes 657 
battles from 1600 to the end of the 20" century. There are up to 152 attributes listed for 
each battle. 


Numerous people have worked with this data set, including some NPS Masters’ 
students. These researchers looked at different aspects of warfare and tried to answer 
different questions. The first analysis was done by CAA, under the Combat History 
Analysis Effort (CHASE) beginning in 1984. Afterwards, Dupuy [Ref. 2] tried to model 


XV 


warfare without using advanced analysis techniques and formed the Quantified Judgment 
Model. Hartley built his Oak Ridge Spreadsheet Battle Model, which allows the user to 
predict the outcome of a potential conflict using an Excel spreadsheet [Ref. 1]. Yigit 
looked at the famous rule of thumb that an attacker with greater than a 3:1 Force Ratio 
wins, and questions such as “How successful are the attackers? Do attackers suffer more 
casualties?’ [Ref. 3]. Coban [Ref. 4] used classification trees to build a model which 


predicts the outcome of a potential battle. 


Among the works mentioned above, Hartley’s claimed that nationality factors 
should have an important role in modeling warfare. There is another work on this subject 
which is of interest to us. Prior to the Gulf War, a British analyst, David Rowland, made 
accurate predictions about the results of the war, relying heavily on nationality factors 


[Ref. 5]. 


To do the analyses, we divide the data set into four subsets with respect to the 
time, because the nature of warfare changes as time evolves and battles in these time 
periods have similar characteristics. The first subset, battles before World War I (WWI) 
covers the battles from 1600 to the beginning of WWI. The second subset is the battles in 
WWI, the third subset is the battles in WWII, and the last subset is the battles after 
WWII. We also focus on four countries, the USA, Germany, Britain and Israel, because 


more data are available in the data set on these countries than the others. 


Our first analysis is done with the objective variables, namely force ratio, tank 
ratio, artillery ratio, air force ratio and cavalry ratio. These come from hard data. That is, 
the values for these variables can be actually collected from the battlefield. We use 
boxplots to show the data structure and Wilcoxon’s rank sum test to compare different 
hypotheses relating to the objective variables. We find that the USA has usually 
accumulated great power on the battlefield. Especially in WWII, the air force and tank 
ratio of the USA is overwhelming, almost incomparable to those of their enemies. We see 
either little or no difference between Germany and Britain, and also, we usually did not 
see a Statistically significant difference between the ratios of countries when they won or 
lost. Among the countries, Israel, has the smallest figures for all objective variables, 


except for air force ratio. 
XV1 


We also examine the relative variables. These variables, such as initiative, 
leadership, and training, come from soft data, 1.e., the values of them are decided by the 
judgment of historians. Therefore, they are subjective and are usually avoided in models. 
Our analyses showed that the data set does not have useful information for these 
variables. For most of the battles, neither side was deemed to have an advantage with 
respect to relative variables and the countries had similar patterns. Only Israel has a 
different pattern than other countries. For training, leadership and combat effectiveness 
advantage, they have an obvious advantage over their opponents in the battles they 


fought. 


We used classification trees in our final analyses to see if the nationality factors 
were important. Tree-based modeling is an exploratory technique for uncovering 
structure in data and is useful for summarizing large multivariate datasets. [Ref. 7] Trees 
do not need distributional assumptions, and interactions between variables are 
automatically included in the tree structure. In addition, they are robust to outlying data. 
One of the advantages of tree-based models is that they are easy to read. There are oval 
(non terminal or split) and rectangular (terminal) nodes. Each node contains the predicted 
outcome and the distribution to the child nodes. The split criterion is shown on each 


branch. 


The first model consists of the battles prior to WWI (Figure 1). In this period, tree 
models show that nationality was the most important variable, that is, the first split 
criterion is nationality. The second important variable is force ratio. This model explains 
76 percent of the battles, that is, the model classifies 76 percent of the battles correctly. 
The second model, the battles in WWI, showed that nationality is the second most 
important variable after force ratio. The second model explained 79 percent of the battles. 
The third model was built with the information on the battles in WWII. Nationality is the 
second important variable after artillery ratio. This model also explained 79 percent of the 
battles. The last model, the battles after WWHI, consists of the battles in which Israel 
participated. The only variable that appears in this model is the air force ratio. To 
evaluate the importance of nationality in our tree models, we fit the models with and 
without nationality factors and compared the misclassification rates of the two. 


XVil 


Nationality factors improved the accuracy of the model for the battles prior to WWI, but 
the improvement was insignificant for the models for WWI and WWII and did not appear 


in the last model. 





Figure 1. | Tree Model for the Battles Before WWI. 


Figure 1 is our first model which includes the battles before World War I. 
The most important factor is nationality. If the defender is from one of the 
following, the USA, Britain, the Confederate States or Germany, the 
model suggests predicting a win for the defender. If we were to predict an 
outcome of a hypothetical battle in this period, we could predict the result 
only by looking at the nationality of the countries and we would be correct 
71 percent of the time. After nationality, the single most important 
variable is force ratio. 


Coban [Ref. 4] found that relative variables were the most important factors 
before WWI. Our model for that period, without using any relative variables, explained 
76 percent of the battles versus Coban’s [Ref. 4] 79 percent. This shows that we can 
replace the relative variables with the nationality variable, and still have a pretty good 


XVIil 


model, at least for this data. This is totally objective, because nationality is known before 
the war starts, whereas the relative variable values are very difficult to determine, and 
vary from analyst to analyst. The models for the other periods did not show the 
nationality variables to be the most important factor. However, combining the results 
from all other analyses with the results of our classification trees, we conclude that 
having sufficient military power on the battlefield is a nationality factor for all four 


countries. 


X1X 


THIS PAGE INTENTIONALLY LEFT BLANK 


XX 


I. INTRODUCTION 


A. INTRODUCTION 

When one reads about history, it is mostly the history of wars. It is not too far of a 
stretch to say that wars shaped our history, and will continue to be one of the most 
important phenomena shaping the future of the world. Having this much importance, a lot 
of effort has been, and is being, devoted to exploring “the art of war”. One of the main 
areas of interest has always been predicting the outcome of a battle before the first bullet 
flies. Related to, and probably more important than, this question is “what relates to 
winning?” As discussed below, many researchers, using different tools, have tried to 


answer this question. 


Simulation is one of the tools used to make predictions about potential battles. 
Simulations are often built on mathematical models, such as Lanchester equations [Ref. 
4]. In the past, capabilities of simulations were somewhat limited, but with improvements 
in computer technology, much more capable simulations are available today. In the end, 
though, simulations are simplifications of combat, and it has proven difficult to model 
intangible factors, such as like leadership, morale, training etc. [Refs. 1, 2, 3], which 


according to many other studies, greatly affect combat outcomes. 


Another tool that analysts use to understand the nature of warfare is data analysis. 
The main challenge with data analysis is finding reliable, useful data. Furthermore, the 
data need to be detailed and large enough to find reliable answers. To some extent, we 
also have this problem, but the data set used in this research is considered to be the best 
data set available on historical land battles. In this work, the CDB90G data set is used. It 
is an updated version of the data set consisting of historical data prepared by the 
Historical Evaluation and Research Organization (HERO) in 1983, which includes battles 


from 1600 through the Arab-Israeli wars towards the end of the 20" century. 


In 1983, the U.S. Concepts Analysis Agency (CAA) contracted the Historical 
Evaluation and Research Organization (HERO) to build a data set of historical combat 
comprising 601 battles. CDB90G, the updated version, consists of 657 battles. There are 


up to 152 attributes listed for each battle. 


Numerous people have worked with this data set, including some NPS Master’s 
students. These researchers looked at different aspects of warfare and tried to answer 
different questions. The first analysis was done by CAA, under the Combat History 
Analysis Effort (CHASE) beginning in 1984. Afterwards, Dupuy [Ref. 2] tried to model 
warfare without using advanced analysis techniques and formed the Quantified Judgment 
Model. Hartley built his Oak Ridge Spreadsheet Battle Model, which allows the user to 
predict the outcome of a potential conflict using an Excel spreadsheet [Ref. 1]. Yigit 
looked at the famous rule of thumb that an attacker with greater than a 3:1 force ratio 
wins, and questions such as “How successful are the attackers? Do attackers suffer more 
casualties?’ [Ref. 3]. Coban [Ref. 4] used classification trees to build a model which 


predicts the outcome of a potential battle. 


Coban’s work in particular is interesting since it uses the relatively new data 
analysis method of classification trees to model combat. Tree based models have certain 
advantages over traditional linear models. They are usually easier to discuss and interpret 
than linear models, and the treatment of missing values (NAs) is more satisfactory with 
tree based models than linear-based models [Ref. 9:p. 378]. Being easier to understand, 


the models built can easily be used by people not very knowledgeable in the subject. 


Among the works mentioned above, Hartley’s claimed that nationality factors 
should have an important role in modeling warfare. Prior to his research, the other works 
discussed did not emphasize the importance of nationality factors as much as did Hartley. 
There is another example of the importance of the nationality factors from a British 
analyst, David Rowland. Prior to the Gulf War, among all the predictions on the outcome 
of the operation, his was reportedly the most accurate [Ref. 5]. Rowland used nationality 
factors as his main variable. Before the campaign started, while most analysts were 
estimating that the battle would last for months and cost thousands of allied lives [Ref. 6], 
he predicted that it would be easy, and came up with figures close to what happened. 


These two works, especially the second one, motivated this thesis. 


The hypothesis that this thesis investigates is that a phenomenon called 
“Nationality Factor” exists; that is, every nation has its own characteristics, which also 


affects its military. For example, people think that the Germans have a long military 


2 


tradition and are good fighters. Indeed, Dupuy estimates that one German soldier had 
more combat effectiveness than two Soviet soldiers [Ref. 2]. Japan has its own fighting 
class, the samurai, who have hundreds of years of tradition, which makes the country’s 
military unique and different from other countries. In Turkey, being a soldier is special. It 
is said that “Every Turk is born as a soldier,” and it is a great honor to die in a battle for a 
person and his family. Many more examples can be found. It is an undeniable fact that 
there is more to winning than having more weapons or superior tactics or perhaps even 
better training. It is interesting to look at battles where the side with apparently less 
power was the victor, not only once, but many times. The recent Arab-Israeli Wars are a 


clear example where an outgunned side (Israel) repeatedly won. 


The purpose of this thesis is to search for the presence of a “Nationality Factor” 
and find its effects, if any using the CDB90G data set. 
B. BACKGROUND 

1. Trevor Dupuy 

A retired U.S. Army colonel, Trevor N. Dupuy, founded the Historical Evaluation 
and Research Organization (HERO), which constructed most of the data set used in this 
thesis. After finishing the data set, he did an analysis using the very same data set. The 
main product was the Quantified Judgment Model (QJM). The formulas in the QJM 
model are only a little more complicated than basic math. A main feature of QJM is the 
use of OLI values. OLI stands for Operational Lethality Index, which is a weapon’s 
maximum effect under ideal conditions [Ref. 2:p. 30]. The effects of the battlefield, 1.e., 
the changes from “ideal” conditions, are represented by different variables. The combat 
power computation is built upon the corrected (the effects of variables included) OLI 
factors and as an end product, the outcome value R (for result), is calculated for both 
sides. If Re—R, (Re: Result friendly, R,: Result enemy) is positive, the model predicts 
that the friendly side wins, and vice versa. Analysis is done to calculate the variable 


values and effects. Nationality factors are not extensively used. [Ref. 2] 


2. Dean Hartley 

In his book, “Predicting Combat Effects”, Hartley analyzed the original HERO 
dataset to determine whether there are any consistent formulae for predicting combat 
effects. The results proved to be positive and were incorporated in a spreadsheet model 


that predicts battle outcomes; including attrition, duration, advance, and victory. 


Attrition, at the gross level, is determined to follow neither the Lanchester Square Law 
nor the Lanchester Linear Law. Instead, it follows a law between the Linear Law and the 


Logarithmic Law. See Equation (1) 


a 0.75 ¢ 0.40 

qe =7 EOE 

d 

Ge __ £075 ¢ 0.40 

ra 

where (1) 


E = enemy manpower, 
F = friendly manpower, 
t= time. 
More extensively than the other works, Hartley used nationality factors as one of 


the more important variables and it appears in many of his computations. For example, 


the predicted log duration of a battle is: p. 95: 
PFLDURA2=.31 + .24*AIRPL + .0000083*STARDAT- 157*TEMP + .00043*RXODP 


- .00047*ABAIYART + .000054*LWIDYART+ .91*ATVAL + .96*DEVAL ; where 
ATVAL: 


if ATTACKER = "Arabs" then ATVAL = 0.5 
if ATTACKER = "Austria" then ATVAL = 0.2 
if ATTACKER = "England" then ATVAL = 0.2 
if ATTACKER = "European" then ATVAL = 0.4 
if ATTACKER = "France" then ATVAL = 0.0 
if ATTACKER = "Germany" then ATVAL = 0.0 
if ATTACKER = "Israel" then ATVAL = 0.3 
if ATTACKER = "Italy" then ATVAL = 0.8 
if ATTACKER = "Japan" then ATVAL = 0.0 
if ATTACKER = "Other" then ATVAL = -0.1 
if ATTACKER = "Russia" then ATVAL = 0.2 
if ATTACKER = "USA" then ATVAL = 0.0 


4 


By the same token, nationalities have different constants in almost every 


calculation. Thus, they greatly affect the end result. 


Despite the extensive usage and its benefits, using nationality factors is still 
considered suspect. The reason is the different nature of national identity. It is not fixed 
and certainly does change over time. Once it was the Romans who ruled the world, 
France enjoyed military superiority from the time of Napoleon to the Franco-Russian 
War, which they lost. In another example, the Ottomans were the main power for 
centuries and then they became “the sick man of Europe.” [Ref. 1] 

ap Faruk Yigit 

Yigit [Ref. 3] explored CAA’s revised version of the HERO database, the 
CDB9OFT. This dataset consists of 660 battles and engagements with up to 140 different 
attributes on each. Yigit analyzed the 3-1 force ratio rule of thumb, the dispersion rate, 
and the daily casualty rate. He divided the data into chronological subsets and analyzed 
each subset. He concluded that force ratio was a reasonable predictor of outcomes. For 
example, a force ratio of 3 to 1 or greater leads an attacker to victory 68 percent of the 
time. Some of his other findings are that greater dispersion of combat troops is a reason 
for the decrease in casualties despite an increase in weapon lethality, and casualty rates of 
the attacker are almost always lower than those of the defender. 

4. Muzaffer Coban 

Coban [Ref. 4], using the latest version of the data set, CDB90G, used 
classification trees to build models that predict the outcomes of potential battles. Tree- 
based methods may be unfamiliar to some analysts, although many researchers like them 
since they present an attractive way to express knowledge and aid in decision making 
[Ref. 9:p. 251]. Coban looked at pre-selected variables, which he thought had more of an 
effect on the outcome of battle. The pre-selected variables were analyzed to show 
descriptive statistics and conditional plots. The pre-selected variables were: 


e Objective variables: force ratio, tank ratio, artillery ratio, cavalry ratio, 
the attacker’s primary tactical scheme, and the defender’s primary 
defensive posture. 


e Relative variables: relative surprise, relative air superiority in the theater, 
relative combat effectiveness, relative leadership advantage, relative 
training advantage, relative morale advantage, relative logistics advantage, 
relative momentum advantage, relative intelligence advantage, relative 
technology advantage, relative initiative advantage. 


e Terrain and weather variables: three terrain factors and five weather 
factors. 


The descriptive statistics and conditional plots revealed the association of the 
variables with the outcome of battles. The descriptive statistics revealed that the objective 
variables are not highly correlated with victory. Some of the relative variables, such as 
leadership, have a strong relationship with victory. However, relative variables are 


subjective and based on historical judgment. 


Using these variables, three tree-based models were considered. Model 1, with 
only the objective variables, resulted in high misclassification rates. This result was 
parallel to the findings with descriptive statistics, which was that objective variables 
alone are not sufficient to classify battle outcomes. Model 2, with both objective and 
relative variables had relatively low misclassification rates. Model 3 used terrain and 
weather variables, as well as the objective and relative variables. However, the resulting 
classification trees did not include the terrain and weather variables, and the 


misclassification rates were no better then those of Model 2. 


Coban conducted another analysis to understand the historical trends in battles. 
Multiple classification trees were built by using the objective and relative variables with 
training test sizes of 125. Each classification tree was built with a training set size of 125 
and the battle after the 125 battles in the data set was predicted. Then, another 
classification tree was built with the next 125 battles, with an overlap of 124 battles. At 
the end, 658-125=533 classification trees were built and 533 predictions made. This 
analysis revealed some important results. First, the importance of variables has changed 
throughout history. Second, the misclassification rates show that past battles failed to 
predict the battles of World War II, in which new tactics and weapons were introduced to 


fighters [Ref. 4]. 


In his thesis, Coban concluded that: 


The predictions of battle outcomes using classification trees revealed as 
high as 79 percent correct (clear-cut outcomes). This result is satisfying 
when the role of luck in battles and hard to quantify factors are considered. 
[Ref. 4] 


This is the most interesting part, hard to quantify variables, which resulted in being the 


topic of this thesis. 


It is always a challenge to work with intangibles. How can you measure things 
such as leadership or morale, especially before a conflict? Can nationality factors be a 


surrogate for these? 


In the CDB90G data set, there are values for the intangible variables as well, but 
since the purpose is to predict the outcome of a war, we need data before the war, not 
afterwards. However, I have a different point of view. I hypothesize that nations have 
their own characteristics, which are force multipliers. A good thing about this particular 
variable is that although it is a soft factor, the nationality factor, unlike other soft factors, 
is “objective”. The purpose of this work is to ascertain if nationality factors correlate with 
the outcome of battles above and beyond other variables. What are the nationality factors 
and can we really talk about them? If the answer is yes, do they change over time? Can 
we come up with a reasonable method to use nationality factors in predicting the outcome 


of a battle? 


THIS PAGE INTENTIONALLY LEFT BLANK 


I. SUMMARY STATISTICS 


A. INTRODUCTION 
This section explores and summarizes the data set using simple analysis 
techniques. Our purpose is to establish a good fundamental understanding of the data set 


before actually doing analysis with classification trees. 


To address the purpose of the thesis, that is to determine the effect of nationality 
on the outcome of a potential battle, the data are analyzed with respect to different 
nationalities. In order to do this, we use different subsets of the data set with respect to 
different nationalities. Variable “nationA”, the nationality of the attacking force, is used 
as the classifying variable. In addition, “nationD”’’, the nationality of the defending force, 


is also used when necessary. 


One of the questions we want to answer is whether nationality factors change over 
time. To address this particular aspect, following Coban [Ref. 4], the data set is divided 
into six different time periods. These time periods reflect important changes in history. In 
each period, war was conducted differently than in the others, in that new technologies or 
new tactics were used. The battles within each period have similar properties. The first 
division is made at 1755, and therefore, the first period is 1600 to 1755. The Thirty 
Years’ War falls within the first period. 1756 marked the beginning of the 7 Years’ War, 
which was the largest of the pre-Napoleonic Wars in the data set. The second period is 
from 1756 to 1814, and includes the 7 Years’ War and the Napoleonic Wars. This was 
the period of great European powers, extensive usage of black powder and big sailing 
ships. 1815 marks the fall of Napoleon, and the beginning of a new era. In this period, 
from 1815 to 1914, a big portion of the data is from the American Civil War. This period 
ends in 1914, the beginning of World War I (WWI). 1914 to 1939 comprise the next 
period. This is mostly WWI, in which warfare changed in revolutionary ways, as many 
new technologies, such as tanks, airplanes and chemical warfare were used. The next 
period, from 1939 to 1945, has data from World War II (WWII). This is the most 
important subset because it has more data on the nations that we are interested in than the 


other subsets and the way battles were fought more closely resembles today’s concepts. 


9 


Another advantage with this period is that the data is more reliable because record 
keeping was much better than before WWII. The last period is from after WWII to the 


present. 


The number of battles of different countries in different periods in the data set is 


shown in the tables below. For acronyms, see Appendix C. 


1939+ thru] 1945+ thru 
= 5 

















Table 1. Battles Per Period, Attacker. 


RowNames 1600+ thru] 1755+ thru} 1814+ thru] 19134 thru] 1939+ thru} 1945+ thru 
—— 1814 1913 1939 == 2000 




















Table 2. Battles Per Period, Defender. 


10 


Tables 1 and 2 show the number of battles in which countries were involved 
during different periods. The first table contains the numbers for when the country was an 
attacker (nationA), the second when it was the defender (nationD). The countries that will 
be analyzed are highlighted. As an example, the USA has 179 attacks, 94 of them in 
WWII, and Germany has 180 defends, 110 in WWII. 


Tables 1 and 2 reveal a problem. Although the data set includes 657 battles, the 
number of battles decreases dramatically when divided into subsets, making analysis 


difficult. To overcome this problem, the following is done. 


1600 to 1914 is considered as a single period. This follows Coban [Ref. 4], who 
considered the battles prior to WW I as a group. He showed that intangibles are the most 


important factors in this period. 


The names of the following countries are combined. BR and ENG, PR and GER, 
SOV and RUSS. CS (Confederate States) and the USA are not combined, because the 
author considered them different countries since the battles of CS were against the USA. 
This is also in line with the way Hartley analyzed the data [Ref. 1]. Also, again for the 
same purpose, the focus will be on four Nations: the USA, Germany, Britain, and Israel. 
The new tables of battles per period for the four countries we will analyze are in Tables 3 


and 4. 


1600+ 1913+ 1939+ 1945+ 
thru 1913 |thru 1939 |thru 1945 |thru 2000 |total 


TT ee 





Table 3. Battles Per Period, Attacker. 





Table 4. Battles Per Period, Defender. 


11 


B. DESCRIPTIVE STATISTICS 

In this section, the important variables of the CDB90G data set will be analyzed. 
Fifteen different variables are considered as potentially important, that is, potentially 
affecting the outcome of the battle. This decision follows the selections made by Coban 
[Ref. 4] and Hartley [Ref. 1], and also reflects the author’s military judgment. The 
variables are divided into two subsets of “objective” and “relative” variables. Objective 
variables are those whose values can be collected from the battleground or from “hard” 
data. They are force ratio, artillery ratio, air force ratio, cavalry ratio, and tank ratio. 
These variables can be known before the confrontation and can be agreed upon by 
different people. While the accuracy of this data is suspect [Ref. 1], they are based on 
numbers, so they have the same meaning for everybody. As an example, one tank is one 
tank for all analysts. Therefore, these variables are called objective variables. On the 
other hand, relative variables, leadership, training, combat effectiveness, are totally 
subjective; the values being based on the judgment of military historians. Unlike the case 
with objective variables, it is extremely difficult to decide the values of these before the 
battle, and differences between different people’s figures are almost guaranteed. 
Therefore, they are called “soft” data, and are almost universally avoided in models [Ref. 


1]. We will not be an exception. 


For our purposes, then, objective variables are much more important than relative 
variables. There are other works, Hartley [Ref. 1] and Coban [Ref. 4], which used relative 
variables in their models. We will follow a different method. All variables will be 
analyzed in this section to reveal characteristics of different nations, but after this section, 
the relative variables will not be analyzed again. Instead, we will try to replace all the 
relative variables with just one variable: Nationality. 

1 Treatment of the Data 

In the data set, some relative variables, relative combat effectiveness, leadership, 
training, morale, logistics, momentum, intelligence, technology, and initiative, have 
values ranging from “—4” to “+4.” A value of “—4” shows that the variable very strongly 
favors the defender, while “+4” shows that the variable very strongly favors the attacker. 
A level of “0” favors neither side. The variable surprise is given in a scale between “—2” 
and “+2.” Again, negative values favor the defender and positive favor the attacker. 


12 


However, it is very difficult to scale these qualities in this much detail. Therefore, 
following Coban’s methodology, we will give those variables only 3 values: “A” for an 
advantage to the attacker, “D” for an advantage to the defender, and “O” for no 


advantage to either side. [Ref. 4] 


Since Coban also used the same data analysis method, classification trees, in his 
models, we will try to follow him when possible, and try to compare our findings with 
his. As in his analysis, weapons effects are expressed as ratios. In some battles, the 
attackers had no weapons of a particular type. This makes the ratio zero, which gives no 
information about the number of the defender’s weapons. In some other cases, the 
defender had no weapons and that makes the ratio infinity. Adding a constant to both 
sides avoids these two pitfalls. Therefore, in finding ratios, one is added to each side’s 
strength. When neither side had a particular weapon system, e.g. tanks, a missing value 


indicator is assigned to the ratio variable. [Ref. 4] 


Descriptive statistics will help understand the properties of objective and relative 
variables and nationalities. Tables, boxplots, barplots and histograms are used when 
informative. 

Zz: Response Variable 

a. Battle Outcome: “WINA” 
The outcome of the battle is expressed in variable “WINA”. A value of 
“1” represents an attacker win, “-1” means that the attacker did not win. Either the 


attacker lost or the historians judged that the battle was a draw. 


13 





1.00 





1.00 
0.90 0.86 





0.82 
0.80 0.78 


0.70 








0.60 








0.50 














0.40 





0.30 
0.20 





0.10 


























0.00 





1600+ thru1913 =. 1913+ thru1939 = 1939+thru1945 1945+ thru 2000 











Figure 2. Proportion of Battles Won By Attacker. 


Figure 2 shows the ratio of the battles won to all of the battles fought 

(Battles Won/Battles Fought | Attack). Israel came into existence only for 

the last period, and we do not have any data from Britain or Germany after 

WWII. For this reason, there are some missing bars. Before WWI, the 

Germans won a large portion of the battles when they were attackers, and 

this ratio consistently fell in later periods. The USA’s hundred percent 

refers to battles in the Korean War. 

3. Objective Variables 

As mentioned in the introduction, the objective variables to be analyzed include 
force ratio, artillery ratio, tank ratio, cavalry ratio and air force ratio. All of the countries 
are analyzed when they were attacking. Although analyses with nation defending were 
done as well, they are not presented here because the results were not useful. 

a. Force Ratio 


The basic formula for force ratio is: 
FR=A/D, 
where 
A is the total strength of the attacker in manpower and 


D is the total strength of the defender in manpower. 


14 


The strength refers to only the combatants, and troops on either side are 


assumed to be identical. The following is a boxplot of the force ratios. 











































































































































































































Force Ratio 
0 5 10 15 
=e ! ! ! ! 7 
USA e fo) e (on) 000 oO fe) 
IS - ° tH° 
GER e Oo fe) e oOo OO 6 
BR e 10) e Oo 000 
T T T T T T T T 
0 5 10 15 
forceRatio 








Figure 3. Force Ratios of Attacking Countries. 


The first boxplot will be explained in detail. This plot is drawn by the 
“pwplot ()” command in S-Plus version 2000 [Ref. 13]. This function enables us to 
draw boxplots for multiple variables, in this case attackers, on one chart. The force ratios 
are on the X axis, and the names of the countries are on the Y axis. The graphic is divided 
into two sections by a vertical line. Above the section on the left reads the number “-1”, 
which refers to the value of the “Outcome Variable’, “WINA”. As explained in the 
respective section, -1 means that the attacker did not win. So, while the force ratio 
boxplots of the battles that the attacker lost are on the left, the ones that the attacker won 


are on the right. 


15 


The rectangle-like shapes in the plot are called “boxplots”. In recent years, 
boxplots have successfully been used to describe the prominent features of data sets. 
These features include center, spread, the extent and nature of any departure from 
symmetry, and identification of outliers [Ref. 8]. The point in the center shows the 
median. The width of the rectangle is an indicator of variability, the wider the rectangle, 
the more the variability is, and the width of the rectangle is called the fourth spread, fs. 
Data between the first and the third quartiles (the middle 50 percent) fall in this rectangle. 
The left end of the rectangle (lower fourth) is the median of the smallest n/2 observations, 
and the right end (upper fourth) is the median of the largest n/2 observations. The 
whiskers on both sides have the smallest and the biggest observations, unless there are 
outliers. Any observation farther than 1.5 fs from the closest fourth is an outlier and 


represented as a small circle. 


Force ratio is universally considered to be an important factor in battle 
outcomes. When Figure 3 is examined, it can be seen that there are differences between 
countries. The first difference is their force ratios. The next table has the average force 


ratios of the countries in which they won and lost while they were attacking. 


1s} 12 05 


Table 5. Force Ratio Averages. 





The USA has the highest average force ratio. Israel has the higher force 
ratio in the battles it lost than the battles it won. 


As it can be seen from Table 5, the USA has a bigger force ratio than the 
others. The USA has three times more force ratio than Israel, which always has a smaller 
force ratio than the other three countries. Israel also has less variability. The boxplots in 
Figure 3 are almost symmetric, that is, the distribution of force ratios for a particular 
country when they won and lost, are almost identical, except for the USA. Normally, the 
force ratio in the battles won is expected to be higher than in the ones lost. Wilcoxon's 
rank-sum test is used to see whether the median force ratio of Germany and Britain in the 
battles they won is greater than the ones they lost. Since we are using the whole 


16 


population, the answer to this question is intuitive and does not require any statistics. It is 
only necessary to compare the medians. Wilcoxon’s rank sum test is being used to see if 
the differences in median force ratio when winning and losing for Germany and Britain 
are indistinguishable from what would be obtained by random samples from the same 
distribution. 

Hy: M,—M, =0 

H,:M,—, >0 


where, 


lt, = Median force ratio when attacking and winning 


ut, = Median force ratio when attacking and losing 


The Wilcoxon’s rank-sum test reveals a p-value of 0.4845 for Britain and 
0.183 for Germany. Both of these values strongly suggest that, at a five percent 
significance level, for both countries, there is no evidence to reject the null hypothesis. 
That is, the medians are the same. In other words, neither of the countries had a 
significantly higher force ratio for the battles they won than the battles they lost. When 
the medians of Germany and Britain’s force ratios when they won are compared, the p- 
value is 0.071. This suggests a difference in their force ratios but the hypothesis that the 
medians of these two countries when attacking and winning are the same cannot be 


rejected at the 0.05 significance level. 


The USA definitely has a bigger ratio and more spread when they won 
attacking as compared to when they lost defending. The final interesting feature is that, 
not only did the USA have a higher force ratio, but it also has many outliers. The 
numerical dominance of the USA on the battlefield and its effects will be discussed later. 

b. Artillery Ratio: “arty” 

arty = Aa/ Ap 


where 
Aa = Number of artillery tubes of the attacker and 


Ap = Number of artillery tubes of the defender. 
17 


This is the only variable which is present in all periods. Prior to WWI, the 
artillery ratio varied a lot [App. B.B.]. After the start of the 20" century, artillery was 


used extensively. 








































































































































































































Artillery Ratio 
0 5 10 15 

USA e C000 OOO ° ° e oO fe) ° 
GER Oo [e) ° @o fe) fo) 
BR e e @ 10) 10) 

T T T T T T T T 

0 5 10 15 

arty 








Figure 4. — Artillery Ratio, Entire Data Set. 


Figure 4 shows the boxplots of artillery ratios in the entire dataset. The 
battles with an artillery ratio more than 20 are not included for the sake of 
interpretability. One point is worth mentioning. During WWII, the USA 
has an average artillery ratio of 8.56. This is very much affected by a very 
big advantage, 20.18 in 1944. 


Figure 4 suggests that, like force ratio, there are differences in artillery 
ratios between different countries as well. The USA again had a very large advantage 
compared to other countries, and more so in the battles they won. Like force ratio, many 
outliers can be seen in the USA’s boxplots. Israel has the smallest advantage compared to 


other forces. Britain’s artillery ratio looks higher in the battles they won than the battles 


18 


they lost. Germany’s artillery ratio when they won and when they lost look similar. 
Again, as we did with the force ratios, the Wilcoxon’s rank-sum test is used to test 


whether the artillery ratio when winning is greater than losing for Britain and Germany: 


The test reveals a p-value of 0.03161 for Germany. This suggests that, at a 
five percent significance level, the artillery ratio when winning is higher than losing, as 


expected. 


For Britain, the p-value is 0.5193 strongly suggests that the median 
artillery ratios when winning and losing are indistinguishable. 

Cc. Close Air Support Ratio: “fly” 

fly = Fa / Fp 


where 
F, = Number of close air support sorties of the attacker and 
Fp = Number of close air support sorties of the defender. 


Close air support is very important in today’s warfare. After armies around 
the world began to use them in combat, airplanes became one of the most important 
factors. Coban [Ref. 4] found that it is the most important variable in wars after WWI. 
Today, the first Gulf War and operations in Serbia have proven that an air force is a 


dominant factor in defining the outcome of a battle. 


Although airplanes were used in WWI, there is so little data in the data set 
that we decided to start with WWII, which is the first war in the data set in which air 


forces played a major role in the outcome of battles. 


19 





Airforce Ratio 














0 200 400 600 800 
| | | 
USA e fe) fe) fe) fe) 


















































GER 


BR 
































Figure 5. Air Force Ratio, All Dataset. 


Figure 5 contains all battles post-WWL. It is very similar to the plot drawn 
with the data from WWII [App. B.C.], the difference being data from the 
Arab Israeli wars and the Korean War. 


The USA used airplanes much more than other countries. The graph is 
affected greatly by the very high figures of the USA. Therefore, it is difficult to read 
other countries’ boxplots. Unlike artillery ratio, we did not worry about truncating the 
data at a particular point this time because the difference is very large. The USA’s 


overwhelming dominance with respect to the air force is an undeniable fact. 


Among other countries, Israel used its air force more than Germany or 
England. Although countries other than Britain had a bigger air force ratio when they 


won than when they lost, the differences are small. 


20 


d. Tank Ratio: “tank”’ 
tank= T, / Tp, 


where 
Ta = Number of tanks on the attacker side and 
Tp = Number of tanks on the defender side. 


Just like planes, tanks were used in WWI on a very small scale, but the 


real use of tanks happened in WWII. 





Tank Ratio 
0 50 100 150 200 250 300 












































GER Oo 90 tes fe) 


BR e e re) oO ° 
























































T T T T T T T T T T T 
0 50 100 150 200 250 300 


tank 








Figure 6. Tank Ratio, All Battles. 


Figure 6 has battles of the entire data set. It is very similar to the plot 
drawn with the data from WWII [App B.D.]. This plot is also affected by 
the USA’s dominance. 


21 


To compare countries other than the USA, the following boxplot is used. 




























































































































































































Tank Ratio 
0 1 2 3 4 5 
IS e e @o o fe) 
GER e e 
BR e e aD 
T T T T T T T T T T T T 
0 1 2 3 4 5 
tank 








Figure 7. Tank Ratio, Israel, Germany and Britain 


Figure 7 includes data with a tank ratio of 20 or less. With this truncation, 
12 data points out of 501 are lost from Figure 6. This truncation is 
necessary to be able to compare these three countries. 


According to Figure 7, Germany had a higher tank ratio than the other two 
countries. Britain and Israel’s tank ratios, Britain’s especially, were higher in the battles 
they lost than the battles they won. This variable seems to have different patterns within 


every individual country. 


22 





Variable Ratio Values - Battles USA Won During WWII 


200 
180 
160 
140 
120 
100 3.114 97.79 


188.95 




















—e— arty 





lee] 
(oe) 


—s—fR 





—— fly 





—s— tank 











1942 1943 1944 











Figure 8. _ Ratios of the Objective Variables of Battles in WW2. The USA is the Attacker 









































and Winner. 
Variable Ratio Values - Battles USA Lost During WWII 
200 
180 170.86 
160 | 
140 
120 124.43 
100 | 
80 
| —o arty 
60 7.78 | - aR 
40 
- | |—#—fly 
—a- tank 
0 
1942 1943 1944 











Figure 9. —_ Ratios of the Objective Variables of Battles in WW2. The USA is the Attacker 
and Loser. 


As Figures 7 and 8 suggest, towards the end of WWII, the USA began to 
have a very big advantage over its opponents. This advantage became very 
overwhelming with “tank” and “fly” ratios. This big advantage over the 
opponents is the main reason for the variability pattern in the charts 
analyzed above. 


23 


e. Cavalry Ratio: "cav" 
cav = Ca/ Cp 


where 
Ca = Number of cavalries on the attacker side and 
Cp = Number of cavalries on the defender side. 


Cavalry Ratio is present in the data set from 1600 to 1905. 





Cavalry Ratio 




















USA ° e 
































GER e (e) 
















































































Cav 








Figure 10. Cavalry Ratio. 


While Britain and Germany had similar cavalry ratios when they won or 
lost, the USA had a much bigger ratio when it won than when it lost. The 
USA’s cavalry ratio also has a high variability in the battles where the 
USA won. Again, the USA has a higher ratio than the other countries. 


24 


The cavalry ratio is the last objective variable to be analyzed. In the next 
section, the relative variables will be analyzed. A general discussion on all of the 
variables analyzed, both objective and relative, can be found at the end of this chapter. 

4. Relative Variables 

Relative variables are represented as categorical variables. As discussed in the 
previous section, relative variables are generally avoided by analysts in their models. 
Although no question exists concerning the importance of these variables, the fact that 


their values depend largely on personal judgment makes them less reliable. 


Another reason for not using them in our classification trees is that the data does 
not have sufficient information on them. This is discussed at the end of this chapter in the 
“Discussion on the Relative Variables” section. However, a preliminary analysis is done 
to see the relationship between nationality and these variables. At this point, it is worth 


remembering our goal: to replace the relative variables with one variable: nationality. 


In the following tables, the letter “A” denotes the battles in which the attacking 
side had an advantage, “D” denotes an advantage for the defending side and “O” means 
there was no advantage on either side. For example, if the “SURPA” is “A” for a 
particular battle, it means that, in that battle, the attacker had the “Relative Surprise” 
advantage, “D” says the defender had the advantage, and “O” says neither had the 


advantage. To familiarize the reader, an example table is explained below: 


2D 


The “COUNTRY” column shows the name of the country. The “OVERALL” 
section of the time periods represents the whole dataset. 


The cells having a number higher than 50 percent are highlighted, which makes it 
easier to see higher figures and patterns in the data. The cells where a corresponding 
figure is not available in the data set, for example, “ISRAEL” does not have any battles 
prior to 1946, and are simply marked with “na”. 









Surprise Advantage Attacker Loses 






Name of the Variable 
The Rol¢ of the Country 


0.14 


Now, we will explain how to read this table, using an example cell. The cell above says 
that, in WWII, among those battles in which Britain was the attacker and loser, the 
attacker (Britain) had the “Surprise Advantage” 14 percent of the time. 


26 


The relative variables considered important are analyzed below. First, we 
highlight the ones that appeared to be more important than the others. These two 
variables were also found important by Coban [Ref. 4]. 


a. Relative Surprise: “SURPA” 
—_ — — — — 


0.20] 0.00 Deka | 039 0.00 
| 0.34] 0.00(NORIS) 0.25] 0.00] 0.45 


a 
ER 0.431 0.00) 0.21] 0.00) 0.43 
roae}codre papa pa free fee 0.45 0.00 


Table 6. | Surprise Advantage Attacker Wins. 





For most of the battles, regardless of nation or period, there was no 
advantage on either side. Significant ones are highlighted. It is worth noting that in all the 
battles the attacker won, the defender never had a surprise advantage. 

b. Relative Initiative Advantage: “INITA” 

This is one of the more important variables. It can be said that, among all 
the relative variables, this is the only one with consistently significant values. One 
interesting point is that the attacker had an initiative advantage in more than 75 percent of 
the battles it won in all of the subsets except for one. Germany had an advantage in 64 
percent of the battles during WWI. The defender never had an initiative advantage, zero 


percent of the time, when the attacker won. 


OVERALL —— — 1989-1945 1945-2000 


OTA| DOTA; DITOTATD 
000) 022 | .00fna_[na__|na_| 
| 0.00jna_| 


EE 





Table 7. —_— Relative Initiative Advantage, Attacker Wins. 


SURPA and INITA are the two of the most important variables in the data 
set. Other variables will be discussed with the help of Table 8. This table includes all of 


the relative variables used in the analysis. Time periods are not as detailed as in the 


21 


individual tables like Tables 7 and 8. Tables of individual variables are in Appendix A, 
and they will be referred to when necessary. Again, all of the figures of Table 8 are from 


the battles where the countries were attacking. 


In Table 8, the values in the cells, as in previous tables, are the 


proportions. To make the tables easier to read, the cells are formatted as follows: 
DSC: If the cell contains a value greater than or equal to 0.8 


: If the cell contains a value between 0.5 and 0.8 


: If the cell contains a value less than 0.2 


Table 9 contains the exact numbers for each cell, instead of ratios. Figures 


in each cell refer to the number of battles. 


28 


0.03 
Monee) 





0.12 
amt 0.03 
7 

















.00 


0 





USA 
BR 0.66 | 


GER 


, 
; 
oc 
k- 
Zz 
= 
: 
oO 
WW 
a 
a) 
<x 
oc 
< 
> 


SURPA 


USA Oe 012 | 0.25 TON 0.03 | 


BR ey 0.23 
0.64 
IS 00 











0.20 
0.00 


34 


0 


TRNGA 





Ratio of All Relative Variables. 


Table 8. 


29 


ARIABLE [COUNTR 


Clam wclaow claw clo 
A? im val? ima al?lim ala 
> Dv > v > Dv > 


i?) 
m 
Be) 


BR 
BR 
BR 


Q 
m 
Be) 


Clalinawclaowcla 
a? im a al? mia ol? 
> v > Dv > 


@ 
m 
Be) 





=O cl= 
O) a ol” 
D > 


Table 9. All Relative Variables with the Number of Battles. 


30 


Tables 8 and 9 summarize all relative variables for our four countries 
while attacking. There is no time segmentation. All the sections of the table reveals 
figures from the entire dataset. While Table 8 contains ratios, Table 9 shows the exact 


number of battles in each cell. 


The rest of the relative variables, those with less significance, are listed 
below. The detailed time period tables are provided in Appendix A. 

(1) Relative Combat Effectiveness: “CEA”. Until WWII, when 
the attacker won, the defender never had a combat effectiveness advantage. In WWII, in 
Britain’s battles, the defender had this advantage in 53 percent of the battles and Britain 
still won. Israel had this advantage even in the battles it lost (86 percent). [App A.B.] 

(2) Relative Leadership Advantage: “LEADA”. Until WWI, in 
more than half of the battles, the side with this advantage won. In WWI and WWII, 
neither side had a significant advantage. In the battles Israel fought, the defender never 
had a leadership advantage. [App A.D.] 

(3) Relative Moral Advantage: “MORALA”.. There is no 
significant moral advantage on either attacker or defender side except the USA. In WWI, 
the USA had relative moral advantage in all battles. [App. A.F.] 

(4) Relative Logistics Advantage: “LOGSA” None of the 
nations had a significant logistics advantage in any of the battles [App A.G.] 

(5) Relative Momentum Advantage: “MOMNTA”. The only 
significant advantage is on Germany’s side in WWII. Germany had a momentum 
advantage in 65 percent of the battles in WWII where it attacked and won. Also, in the 
entire data set, the defender never had the momentum advantage. The exception is US 
battles in WWII. In six percent of the battles, the USA lost when attacking and the 
defender had a momentum advantage. [App A.H.] 

(6) Relative Intelligence Advantage: “INTELA”. There is no 
significant advantage on either side. The only exception is Germany in both World Wars. 
It is not very significant, but when Germany attacked and won, it had an advantage in 36 


percent of the battles in WWI and 40 percent in WWII. [App A.L.] 


31 


(7) Relative Air Superiority: “AEROA”. This variable 


determines the quality of the air force. The USA and Britain had this advantage in a large 


portion of the battles they lost, but not so large in the battles they won. Israel had this 


advantage in 86 percent of its battles 
GENERAL DISCUSSION ON RELATIVE VARIABLES 


In this section, we look at the relative variables together. 


C. 


Looking at Table 8, three trends can be easily seen: 


The defender hardly ever had an advantage over the attacker. The 
countries we analyze are the attackers. Only 7 out of 80 cells belonging to 
the defender have values of more than 20 percent, the highest being 31 
percent. Thus, can we say that this data suggests these countries always 
fought with the countries possessing inferior qualities? Not really, because 
they also fought with each other. However, interestingly enough, 
according to the data, for all variables with the exception of “Initiative 
Advantage’, there is often no advantage on either side. In more than 50 
percent of the battles, neither side had the advantage, except for those of 
Israel. 


Israel has different values than other countries analyzed. For the variables 
“CEA”, “LEADA” and ”TRNGA”, Israel had an obvious advantage over 
the defender. However, interestingly enough, there is no significant 
difference between Israel’s degree of advantage when they won or lost. 
For example, they had a Leadership advantage in 57 percent of the battles 
both when they won and they lost. And, they had Combat Effectiveness 
advantage in all of the battles, 100 percent, when they won and 86 percent 
of the battles they lost. 


“Tnitiative Advantage” is the only variable where the attacker consistently 
had an advantage over the defender. This results from the fact that the 
attack is often done to seize the initiative. “Offensive operations are the 
means by which a military force seizes and holds the initiative while 
maintaining freedom of action and achieving decisive results. This is 
fundamentally true across all levels of war.” [Ref. 11]. In other words, the 
attacking side has the initiative advantage almost “by definition”. 


If the two exceptions discussed earlier, Israel from the countries and 
“Tnitiative Advantage” from the variables, are put aside, in more then 50 
percent of the battles, there is no advantage on the either side. As an 
attacker, Britain had a Combat Effectiveness Advantage in 48 percent of 
the battles, which is the only exception. This also supports our earlier 
claims. To decide the values of relative variables is so difficult that even 
the historians could not find an advantage on either side in more than 50 
percent of the battles. 


a2 


D. SUMMARY 


In this chapter, the variables that are considered to be important were analyzed 


with respect to the countries. Other than the results discussed in the previous sections, 


some other important considerations are given below: 


It is important to note that, although this data set is the best data set on 
historical land combat, it is not at all perfect. It would be a serious mistake 
to accept this data as the ultimate truth because: 


1. The data was collected by military historians. Therefore, the battles 
listed in the data set are decided upon their comfort level. They are 
not all of the battles fought, nor are they necessarily the most 
important ones or a random sample. It may be the case that they 
did not have sufficient data on many very important battles and 
therefore ignored them. 


2 The countries we focus on, the USA, Britain, Germany and Israel, 
are usually considered to be successful on the battlefield. We are 
forced to do this because these are the ones for which sufficient 
data exists. They all have similar kinds of properties: Extensive use 
of technology, a large economy behind the war machine and 
extensive experience. Some may argue that Israel does not fall into 
that category, but compared to their enemies, the difference is 
obvious. As a result, it is hard to find the advantages caused by 
nationality. 


All of the battles that were in the data set have only one nation as the 
attacker and the defender. It is a fact that this is not the case in many 
battles. There were and still are alliances. This appears to be another 
limitation of the data set. 


An interesting point discussed before is the fact that the USA had a huge 
power on the field towards the end of WWII. It really is difficult to 
analyze the nationality factor of the USA. It can be said that accumulating 
a big power on the battlefield and outnumbering the enemy is a main 
characteristic of the USA, but this could only be decided upon for certain 
with the support of military historians. 


When the objective variables were analyzed, all countries had different 
characteristics. Had they had similar properties, it would have been easier 
to determine the effect of nationality. However, in our case, one may not 
be able to decide whether it is the objective variables or the nationality 
factors that affects the outcome. 


Israel’s smaller values than the others, smaller force ratio, artillery ratio 
etc., suggest that the way in which wars are fought has changed. Even 
fewer weapons can and do provide more lethality. 


es) 


THIS PAGE INTENTIONALLY LEFT BLANK 


34 


I. CLASSIFICATION TREES 


A. INTRODUCTION 

In this chapter, classification tree models will be analyzed. First, the reader will be 
informed about tree-based modeling, a relatively new analysis method. Second, we will 
introduce the tree models built using the CDB90G data set. A discussion on the models 


built, and further study suggestions, will conclude this chapter. 


Tree-based modeling is an exploratory technique for uncovering structure in data. 
Specifically, the technique is useful for classification and regression problems when one 
has a set of classification or predictor variables (x) and a single response variable (y). 
Tree-based models are relatively new, but are gaining widespread popularity as a means 
of devising prediction rules for rapid and repeated evaluation, as a screening method for 
variables, as a diagnostic technique to assess the adequacy of linear models, and simply 


summarize large multivariate datasets. [Ref. 7] We will use them for the latter purpose. 


Trees simply show the structure of the data. Trees do not need distributional 
assumptions, and as such, transformations are not needed. Any interactions between 
variables are automatically included in the tree structure. Furthermore, they are robust to 


outlying data. 


Trees are arranged hierarchically. Until a terminal node is reached, the data 


flowing down the tree encounters one decision at a time. 


One of the advantages of tree-based models is that they are easy to read. There are 
oval (non terminal or split) and rectangular (terminal) nodes. Each node contains the 
predicted outcome and the distribution to the child nodes. The split criterion is shown on 


each branch. 


For example, Figure 11 shows a tree model built on the entire data set. The root 
node says that there are 657(260+397) data points in the data set. 260 of them are the 
ones where the attacker did not win and 397 times the attacker won. The first split is 
determined by which country is attacking. If the defender is Britain, the Confederate 
States, Israel, or the USA, we go to the left node (a terminal node) and if the defender is 


one of the other nations, Austria, Egypt or the other ones mentioned in the right branch 
DD 


go to the right node (which is a split node) on the right. According to the terminal node 
on the left, the tree model predicts that the attacker does not win, i.e. it could be a loss or 
a draw. At the terminal node on the left, there are 156 observations, 87 of which are “-1” 
for attacker losses, and 51 are “1” for attacker wins. If we go to the right branch, we 
reach a split node. In that split node, there are 173 observations with a WINA value of “- 
1” and 346 observations with a WINA value of “1”. At that node, the question is “What 
is the force ratio?” If the force ratio is less than 5.38194, then the attacker is predicted to 
win. This is the terminal node in the middle. Again, 171 refers to the number of “-1’’s in 
that node, and 318 refers to “1”s. If the force ratio is greater than 5.3819, the right branch 
is chosen, which also suggests a win for the attacker. However, as the reader can 
recognize, although both of the terminal nodes, the middle one and the one on the right, 
suggest a win for the attacker, the misclassification rates are different. Now, we will 
discuss the algorithm behind the tree models and how the splits are decided and the tree 


built. 


36 






nationD:‘BR,@S,IS,USA 
nationD:AUS,EG,FR,GER,|MPMAP,OO,RUSS,SOV,SYR,TU 







| 
(87/51) 


(171/318) 





Figure 11. Tree Model of the Entire Data Set. 


The tree models are fit by binary recursive partitioning, by which the data set is 
successively split into increasingly homogenous subsets. [Ref. 7] The usual set-up for 
regression, or classification if the response variable is categorical, trees is as follows. The 
n responses, in our models the variable WINA, y,,..., y,,and the predictors x; are collected 
for each y;, Starting with all y’s in one node, the impurity of that node is measured. 
Impurity can be one of several different measures, deviance or Residual Sum of Squares 
(RSS), Mean Sum of Squares (MSE) etc. Both S+ and the rpart algorithm measure 
impurity by deviance. For more information on deviance, see Devore pp. 502-503 [Ref. 
8]. The objective is to divide the observations into sub-nodes of high purity, i.e., have as 


many similar y’s as possible. So, at a node (a “‘split’), if the data is categorical, the data is 


Oe 


divided into two subsets one including some of the categories, the other with the rest, 
e.g., we might separate a group of nations from the others. If the data is continuous, every 
possible split of the form X<a is considered. The criterion by which the split is decided is 
called the split criterion. Then, the impurity (RSS) is computed for each of the two 
groups. The split decreasing the impurity most is chosen. [Ref. 12]. This process of 
splitting can continue down to every single observation, which would be over-fitting. In 
order to avoid this, the tree construction continues until the number of observations in 
each node is small, by default n;<20 for rpart, or the leaf is sufficiently homogenous, 


1.e., with small impurity. 


There are several tree methods available. Since there are many missing values in 
the data set (Table 10), we prefer to use the rpart [Ref. 10] method because of the way 


it handles the missing values. 


[PERIOD_[ i | arty | fly | cav_| tank | Total Number of Battles 
Before wwI] 0 | 95_| 251] 107 | 251 | 251 
wwr | _o | 46 | 126 | 130 | 101 | 133 


Afterwwih| o | 2 | 2 | 7 | 4 | 80 
ToTaL__| 1] 60 #60 | 509 | oor] SS 





Table 10. Number of Missing Values. 


This table gives the number of missing values for the objective variables 
used in building the tree models. It is important to note that tanks and 
airplanes were not present before WWII and they were used on a very 
small scale during WWI. Also, cavalry was used mainly before WWI. 
Therefore, some of the big numbers are basically historical facts. 
However, even taking this into consideration, a large missing value 
problem exists forcing us to use rpart. 


In rpart, when missing values are encountered in considering a split, they are 
ignored and the probabilities and impurity measures are calculated from the non missing 
values of that variable. Surrogate splits are then used to allocate the missing cases to the 
daughter node. Therneau [Ref. 10] contains some more detail about the usage of 
surrogate splits in rpart. The next two paragraphs briefly explain the use of surrogate 


splits. 


38 


Once a splitting variable and a split point for it have been decided, what is to be 
done with observations missing that variable? One approach is to estimate the missing 
datum using the other independent variables. rpart uses a variation of this to define 


surrogate variables. 


As an example, assume that “Force Ratio <2” has been chosen as the split 
criterion and there are data points missing information on the force ratio. The surrogate 
variables to be used for the data points which are missing the value for the force ratio, are 
then found by re-applying the partitioning algorithm (without recursion). The two 
categories “Force Ratio <2”, “Force Ratio >2” are predicted using the other independent 
variables. For each predictor, an optimal split point and a misclassification error are 
computed. The surrogates are than ranked. Any observation that is missing the split 
variable is then classified using the first surrogate variable, or if that is missed, the second 
surrogate is used, and so forth. If an observation is missing all surrogates, the blind rule 
of “go with the majority” is used. Other strategies for these “missing everything” 
observations can be argued, but there should be few or no observations of this type. [Ref. 


10] 


Another issue with tree models is pruning. After building the model, it is usually 
the case that it is over-fitted. [Ref. 12] The trees are built in order to minimize the 
impurity. In doing so, the trees grow too big. In other words, the models are too good. 
This, of course, decreases the model’s ability to predict. Pruning is used to solve this 


problem. 


The question we ask then is whether we want a predictive model or a descriptive 
(explanatory) one. As mentioned above, trees can be used for both. If it is a predictive 
model, pruning to the optimum size using cross-validation is vital. However, for 
descriptive models, in other words, models to explore the data, pruning is not that great of 


a concern. The tree explaining the data best is used. 


39 


One of the problems faced in this situation is lack of sufficient data to build 
predictive models. When trees we built are pruned to the optimum size, they become too 
small to be used in predictions. As a result, we will use tree models to describe the data 
and explore the nationality factors in the data set. For this reason, pruning will not be our 
concern. All the models built are descriptive models. 

B. TREE MODELS 

In this section, the tree models that were built are examined. Trees were built by 
using the battles in which the countries that we are analyzing, the USA, Britain, 
Germany, and Israel, appeared either as an attacker or a defender, and only the objective 
variables were used as predictive variables. Since not all of the variables are present in 
each period, only the appropriate ones, whose names are given in the sections where the 
trees described are used to build the models. 

1. Model 1: The Battles Prior to World War I 

This is the model for the battles before 1910, see Figure 12. The model shows that 
nationality was the most important factor affecting the outcome of the battle. In other 
words, if only one variable were allowed, we would choose: “What is the nationality of 
the attacker?” Three of the countries in which we are interested appeared at the first split. 
According to the model, the USA, Germany and Britain tended to win the battles in 
which they were defending. This was correct in 47 of the 69 battles in which they were 
defending. 

De Model 2: The Battles of World War I 

This is the model for the battles of WW I, see Figure 13. According to the model, 
the most important factor was force ratio. The second most important was nationality. 
However, it is important to note that the split criterion for the force ratio is 4.05. There 
are 15 battles where the attacker had a ratio at least 4.05 and the attacker won them all. 
Out of these 15, 10 were from the USA, 3 from Germany and 2 from Britain. This, again, 
leads us to the same question asked previously: Is it the nationality or the objective 


factors that have the real effect? 


40 


3. Model 3: The Battles of World War II 

During WW II, the most important variable was artillery ratio. The second most 
important was, as in WWI, nationality. The USA, Germany and Britain again appear in 
the second split. They won the battles in which they were defending against an attacker 
who did not have sufficient artillery support. 

4. Model 4: The Battles that Israel Fought 

This particular model follows our historical segmentation. This model contains 
the battles fought after WWII, but instead of including all the battles, we focused on 


Israel, and tried to ascertain if nationality factors are important. 


4] 





nationD:BR,CS- 


Figure 12. Model | Battles Before World War I. 


Model | explains 76 percent of the battles. That is, the terminal nodes correctly 
classify the outcome 76 percent of the time. The most important factor is 
nationality. The USA, Britain and Germany appear in the first split as 
defenders explicitly and as attackers implicitly. Implicitly, because, these are 
the battles of those three countries. When other countries are defending, they 
are the attackers. If we were to predict an outcome of a hypothetical battle in 
this period, we could predict the result only by looking at the nationality of the 
countries and we would be correct 71 percent of the time. If the USA, Britain 
or Germany is either defending or attacking, they win. With the exception of a 
few draws, a value of “-1” refers either to a draw or a loss for the attacker. 
After nationality, the single most important variable is force ratio. Other 
variables present at this period, artillery ratio and cavalry ratio, did not appear 
in our tree. The first split including force ratio is 1.08, which is interestingly 
small. This reminds us of the findings of Yigit [Ref. 3], and his work on the 3 
to 1 force ratio rule-of-thumb. As can be seen from the force ratio boxplots 
[App. 1.1], before WWI, the force ratios are small and there is no significant 
difference between the force ratios of the countries analyzed. This helps us to 
understand two things: (1) As we claimed before (Chapter II Conclusions) 
since the countries have similar properties, it is easier to decide whether 
nationality has an important effect. Also, in this case, the tree decided that 
nationality is the primary factor. (2) The split criteria related to force ratio are 
small because all countries had similar force ratios. 


42 






nation D:FA,GEA,TU,USA 
nationD:BAY@O,AUSS 


Figure 13. Model 2 Battles of World War I. 


This model explains 79 percent of the battles. The most important factor is 
force ratio. The second most important factor is nationality. In this period, 
force ratio began to be much more important on the battlefield. Also, now, 
unlike the battles prior to WWI, the threshold is much higher. As the reader 
may recall, the first threshold for force ratio in the battles prior to WW I was 
1.08, see Figure 12, as opposed to 4.06 in WWI. This is mostly because of the 
USA’s high force ratios. 10 out of 15 observations in the terminal node on the 
right is from the battles of the USA. Also, this is not surprising, considering the 
fierce defenses of that era. It takes mere power, i.e., force ratio, to defeat the 
defender. Another point, though, all 15 battles that have the large force ratio 
have the tree countries as the attacker. Again, how can one decide whether it is 
the force ratio or nationality that affects the outcome? Second and third splits 
are nationalities. If the USA is attacking, they won even with a force ratio less 
than 4.06. Britain and Germany are less successful at attacking, but the 
Germans were better defenders than the Britons. The USA is good at both 
defending and attacking. 


43 





Figure 14. Model 3, Battles of World War II. 


This model explains 79 percent of the battles. The most important variable is 
artillery ratio. Technology and advanced weapons began to play a more 
important role on the battlefield. In the battles where the attacker did not have 
an artillery ratio advantage, the second most important factor is nationality. In 
those battles, the USA, Britain and Germany won as defending armies. In the 
battles where the attacker has an artillery advantage, the second most important 
variable is tank ratio. An advantage of 3.7, along with an artillery advantage of 
1.3, almost guaranteed the attacker’s victory, 64 out of 79 battles. When the 
tank ratio is smaller, Britain won the battles where they attacked, while 
Germany and the USA lost as attackers, if they do not have a tank ratio of 1.9 
or an artillery ratio of 3.15 or more. To summarize, all three countries are good 
defenders; Britain is a better attacker when they have less power than Germany 
and the USA. However, the importance of weapons appears much higher than 
in the previous time periods. 


44 


Endpoint = WINA 





Figure 15. Model 4, Battles that Israel Fought. 


This simple model explains 81 percent of the battles. The importance of 
advanced weapons is still increasing. The only important variable is air 
force ratio. However, we know from the data set that Israel won 82 
percent of the battles they attacked. Again, as we claimed previously, see 
Chapter II, Conclusions, the countries we are analyzing are those already 
using the more important factors, the decisive variables such as artillery 
and tank ratio in WW II or air force ratio after WW II. Thus, it is difficult 
to decide where nationality has a role, or what is a nationality factor. 
These issues will be addressed at the end of this chapter. 


After the models are fit, the question is how good are the models, or how 
important are the nationality factors? For a predictive model, there are a couple ways to 
ensure quality. One is _ cross-validation, which is used with rpart’s 
“orune.rpart ()” command. This was tried and proved ineffective because of a lack 
of information. The shortage of data, as discussed above, was the main reason for 
building explanatory models rather than predictive ones. Another way to build better 
trees is by dividing the data in random subsets, and then building the model using one of 


the training set subsets. After building the tree, it is evaluated with the rest of the data. 


45 


Doing this with different subset variations until a good tree is built is another approach to 
build good models. However, we have the same problem with cross-validation: 


insufficient information. 


Our models, as mentioned previously, are explanatory as opposed to predictive 
models. Thus, there is another measure on which we can assess our models: the 


misclassification rate. 


The misclassification rate is the measurement of what percentage of the data can 
actually be explained with the model. We will use it as our measurement to assess the 
models. Our models with the nationalities were presented in the previous section. To 
evaluate the importance of nationality, models without the nationality factors were also 
built. They will not be presented, but instead, misclassification rates with and without 


nationality will be compared. The next table contains those values. 

















WITH NATIONALITY WITHOUT NATIONALITY 
BEFORE WW1 0.244 0.315 
DURING WWI 0.212 0.250 
DURING WWII 0.216 0.219 














Table 11. Misclassification Rates of the Trees with and without Nationality. 


As can be seen from the table, the effect of nationality, the change in the 
misclassification rate with nationality, was largest before WWI. That is 
not a surprise, since nationality was the primary split in that period. An 
approximate 7 percent improvement in the misclassification rate occurred 
when nationality is used. During WWI, the improvement was 3.8 percent. 
Beginning with WWII, the nationality variable began to be a rather 
unimportant factor. 


C, SUMMARY 

In this chapter, we analyzed the data set using the classification trees. Some of the 
important conclusions reached follow. 

e Nationality is the most important variable prior to WW I. 


e There is an obvious trend in history related to predicting the outcome of 
the battle, that is, as time passes. Technology and advanced weapons play 
a more important role in deciding the outcome of the battle. Force ratio 
was the decisive factor up to WW II, but artillery and tank ratios were in 
WW II and the air force ratio after that. However, as demonstrated, the 
countries we analyzed, the USA, Germany, Britain and Israel usually use 


46 


those weapons more effectively than the others. This, their consistent 
ability to use the most effective weapon systems, is their characteristic. 
Thus, even if the exact figures about their force structure are not available, 
it would not be wrong to predict that they have enough to win the battle 
they are fighting. 


The USA will almost certainly have an overwhelming force on the 
battleground to ensure they win. 


47 


THIS PAGE INTENTIONALLY LEFT BLANK 


48 


IV. CONCLUSION 


The analyses produced some interesting results. As we mentioned in the 


introduction chapter, our purpose was to find the importance of nationality on battle 


outcomes. For the analyses, we did the following: 


The analyses focused on four different countries: The USA, Germany, 
Israel and Britain. 


Since the nature of warfare evolves, the data set is divided into four 
periods: battles before WWI, WWI, WWII and the battles after WWII. 


By combining our findings from summary statistics and the tree models, we 


conclude the following: 


Relative variables are avoided internationally and it is not a good idea to 
use them in the models. The reason for avoiding them is that they are 
subjective and hard to determine before a battle. In addition, we also found 
that the data set does not contain much information on the values of these 
variables. That is, according to the data, in the majority of the battles 
neither side has an advantage. In other words, even if one decides to use 
the relative variables in a model, it will be difficult to find discriminatory 
information in this data set. 


The tree models show that nationality was the most important factor in the 
battles before WWI. This is in line with the findings of Coban [Ref. 4], 
who found that in the battles before WWI, the relative variables are more 
important than objective variables. Here, we are using nationality as a 
surrogate for the relative factors. In this thesis, one of the questions we 
asked was whether we can replace all relative variables with just 
nationality. Also, as the results demonstrate, we can replace the relative 
variables with nationality alone, when relative variables are important. 
Coban’s model for the battles before WWI has a misclassification rate of 
21 percent, and ours has 24 percent. Although the analysis methods have 
minor differences in that his is a predictive model whereas ours is a 
explanatory one, the comparison provides a very good indication of the 
soundness of our model. 


The importance of weapons and technology has been increasing since the 
beginning of the 20" century. Also, the countries examined made 
consistent use of weapons and technologies, which affect the outcome of 
the battle. Therefore, even though we cannot determine the exact 
importance of nationality by examining the results of our tree models, we 
can conclude that when combining them with other analyses, the four 
countries, the USA, Germany, Britain and Israel are expected to have 
sufficient weapons on the battleground to win the battle. Considering the 
amount of the data existing on the battles of the USA, it is easier for us to 
49 


reach a conclusion about the USA’s nationality factor. That is, the USA 
almost always has had an overwhelming military power and this is a 
national characteristic of the USA. Looking at recent combats, this 
conclusion seems to be solid, even more so today. 


Although we conclude that it is the objective variables that are more 
associated with the outcome of the battle, we cannot say that an advantage 
in these guarantees success. The analyses in the second chapter showed 
that in most of the battles, no statistically significant difference exists 
between the relative variable values in the battles won or lost. This leads 
us to one truth about the phenomena of warfare: in war, luck and some 
other factors that can never be predicted nor can even be named, have a 
very big influence. 


A. FURTHER STUDY SUGGESTIONS 


We used only the variable nationA, the nationality of the attacking 
country, as our response variable in the analyses done in Chapter II, 
Summary Statistics. It will be helpful to see what the results are also using 
nationD, the nationality of the defending country. 


Although S-Plus is a very powerful software package, it does have some 
limitations. Several new algorithms related to classification trees are 
available in other software packages. For example, with the methods 
available in S-Plus, each split has only two branches, but, in Clementine, 
the user can decide the number of branches at each split. It will be 
interesting to see what the results are if splits are forced on each nation, in 
other words, have the tree grow in such a way that every branch from a 
split has a different nation. 


With specific countries, further analyses can be done in more detail by 
using other statistical analysis techniques. For example, with the amount 
of data available, it is possible to analyze the battles of the USA and find 
their specific characteristics. Then, combining the results from these 
analyses, tree models can produce more significant results. Using cluster 
analyses might also be a good choice to analyze the data set. 


We talked about the data set and this data set not being the ultimate truth 
(Chapter II, Section D, Summary). Beyond that, in the analyses, we 
considered all battles equal, which in our opinion, is a pitfall. Battles are 
different, with respect to their size, or the importance of their results. In 
the same data set, a different selection of battles among all the others can 
be made. Professional help from a historian might be useful to do this. For 
example, having more homogeneous subsets and discarding the battles 
with an unreasonable force structure such as the ones in WW II in which 
the allies had an incredible advantage over Germany, may help reach 
better conclusions. 


50 


APPENDIX A. TABLES OF RELATIVE VARIABLES 


In this section, tables for relative variables analyzed in the summary statistics 
section are provided. There are four tables for each variable, all with respect to the time 
periods. The first two tables are for the battles where the countries attack and win, the last 
two are for the battles where they attack and lose. The first and third tables have the exact 
number of battles for all countries. The second and fourth tables are with the four 
countries we analyze, and have the proportion of the battles’ data. The reader may refer 
to p.26 for further explanation on how to read the tables correctly. 


A. “SURPA” 


SURPRIZE ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 __ 1945-2000 
RowNames| OP ATOPOPTATOTOLALOPOLALToOPoOLAd | 
AUS | 6 | 4 | OF BT 4 OOF 3} oO} Of Of of oF of oo | 
a [i 2a BS ie a0 ee eos oo a) 

ce | oO | o| of o| 0] 0 | 


=a 


po foto} oi 
a on 21/10] of 4|1}ofotolofol{ol]o| 


en 22) ao (ee (oe Pe E07 Pes a Eee 
Is eT BT Of of; of; oto} of] of of of of 16] 13] 0 | 
oof 36 | 26] of 32; 15} ot4] 5} ofotitofol so | 
Eon Eee Od ESS Pea os RON MON 0s Hea eos ad en Rea ao 
RUSS {| Of 2;ofototofo|2{}ofoftotofzol oo 
Sov. [16] 6 |ofo}to}ofolt1] of] s|ofol oo | 
USA 88 [22] of 6] 4 | oi] 2] of ss] 6 of oj} oj} ol 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRYT OT A|OPOTATOPOVATOPOTAToOTOTAd | 
| 0.20] 0.008080] 0.20] 0.00 RNS 0.39] 0.00 FRG} 0.10] 0.00 
| 0.34] 0.0080] 0.25] 0.00] | 0.00 NORA 0.29] 0.00 





BR 
ER | 0.43] 0.00R@RM8} 0.21] 0.00) | 0.00] | 0.00) 
| 0.45] 0.00) | 0.45] 0.00) 





51 


SURPRIZE ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOTATOPTOPLATOPOPLALToOPTOLALD | 
AUS | it 71 | OF 7] 1] ot4|o;otototoazoto| oa | 
BRL 23] 5] 3 E 9 to] 2s} of oye tit ofot ool 
cs__| tif 6 | 0) fii] 6 {| of o;oftofolj]ofofol oo. 
ENG [| 27a} ovat it opotofofototofyot oto. 
O_o pis} 1} 2t7}]o0lofoltoj}ofo]o]o| 


rasa 08 | 0.00) [i 2 a cea 
REINO Foal 100 Ina__|na__|na_| 
| 0.00} 0.33] Ina__{na_|na_| 
[0.00] 0.00 | 0.00} 0.00] 





B. “CEA” 


RELATIVE COMBAT EFFECTIVENESS ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLATOPTOPLATOPTOLALToOTOLAL dD | 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


OUNTRYP OFT ATOPOTATOPTOLATOPTOTAToOToOTAd | 
| 0.15] 0.00 ROO] 0.00] 0.00) Ina__|na__|na_| 


BR | 0.00] | 0.00] 0.47] 0.00}N@RSB§na_ [na |na_| 
| 0.49] 0.00 RN@RBS) 0.37] 0.00) | 0.00] | 0.00fna_|na_[na_| 
IS__ 0.00 FROG] 0.00fna__|na__na_| na_|na_|na_] 0.00 0G} 0.00 





52 


RELATIVE COMBAT EFFECTIVENESS ATTACKER LOSES 

OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| eo AT OA OP OA OOP AD 
AUS | OT OT BEST OT STAT OT of ot; ofofoltolo 
BR ert th Pit of oft +{ot4{of;{sfototoa 


| 0 
| 0 
| 0 
| 0 


aE ERE EE 
bee GG 
HEE G&G 
Bea GG 


| 2 
USA 44] 3 4 et ot iP ef of 3 p22] 3 {of 0 | 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


POTATOPOVTALTOPTOTALToOPoOTALToOToOPAtTd 
| 0.00] 0.06,N@RGM] 0.00] 0.33 FN@NGS] 0.09 0.29fna__[na__|na_| 
BR | 0.00] 0.07 5NGR92] 0.08] 0.00RNORSM] 0.00] 0.43fna__[na__|na_| 
| 0.00] 0.33 5N0888] 0.17] 0.0ORNGISS] 0.30] 0.04fna__|na__|na_| 
Bb0-00)na na [nara [pa [nara [na [na 0.14 L_ 0.00] 








Cc. “AEROA” 
AIR FORCE ADVANTAGE ATTACKER WINS 


OVERALL 1939-1945 1945-2000 


RowNames}| OP ATOPTOLPALoOTOLTA D | 
AUS | OF} OF OF O} OO} OF 0} 0 | 





AIR FORCE ADVANTAGE ATTACKER LOSES 


1939-1945 1945-2000 


OVERALL 


<x 


wl 
Bl 
E 
o 
€ 
o 
Zz 
= 
e) 
oc 





1939-1945 1945-2000 


OVERALL 





54 


D. “LEADA” 
LEADERSHIP ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOTATOPTOLATOPTOLALoOPTOLAD | 
AUS | 6] 4] OF BS] 4 OF 3s} oO} oOFol;ot}ototlol ol 
BR | 28 | 8 | tT eal Soe i Oe Rod (ROSROs om 
cs TF OT ST oOFoTSs | ofFo}|ototolto}ototo|o| 
ENG [275] o7275]ofot;oftofoto}toyot oto. 
FR 14] 22] o Piof aif of 4{i1{ofololofotot|o | 


1 


—s —100 
ol —s — 
©| oo KK 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRY] O | AT 0. 
0.00] 
| 0.00] 


BR | 0.06 
; | 0.00] 
| 0.00fna__[na__{na_Jra__|na_|na__Jna_na 


[o) 
& 
oO 


oO 
wo 
foe) 


o|O 
NLO] 
o|O 
POPP 


oO 
nN 
Oo 
(oe) 
PO 


0.03 


LEADERSHIP ADVANTAGE ATTACKER LOSES 





OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| O [TA|TDODPO;TATOPOLTATOPOTALoOTOLA dD | 


AUS | 6 | Of] OS] OT SS | OO] 1 PToO;|ot}oto| ol ol 
BR | 22] 1 8 ae 2 Oe a ee Oe Or oe eos ow 


| 0 
ENG [| tf of] 2yit of 2q ol 
FR of i3{ 1 {| 9 P7 {i {sq e | 
fo | 0 | 0 | 

| oO | Oo; of 0 | 

| 7 | 2 | i9f 8 | 
Ti Sem | 0 
| 1 
| 0 | 


1 


2 2 
| Of O| 0 | 
| Of 2] 0 | 
| of oto. 
| i fo] 0 | 

| 2 | 0 | 
| 0 | 


fo fo | 0 | 
re 
| i fo | 0 | 
| 2 feof 2 ti 


| Of 0 | 
| Of 0 | 
| o | 0 | 
Ee ion 
IS 3 | Of 3 | 
| Of 0 
| Of 0 | 
| Of 0 | 
| Of 0 | 
| 6 f 0 


RUSS 
USA 42 | 0 | 19 | 


1 
| 270] o0/] of 0 | 2 
Ps Totof 5 | 


OVERALL 1600-1914 1913-1939 


| 0.00] 0.31} 0.47| 0.00] ; | 0.00] 0.17] 
BR :; | 0.00} 0.00 
| 0.09} 0.04 
L_ 0.00] 43 


55 


=? 
ine) 





E. “TRNGA” 
TRAINING ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 


RowNames| O TATOPOLTATOPTOLTATOPTOPATD 
AUS | 8] 2] OF SF | 2] oF sto; ofol of oO 
BR 2ty 7 oP st eT opto} i Pope fotsa | 
cs dT UT OT OF SH} oOtTofotoftofot} oo. 
ENG [| 4] 3/074] 3{ofFo}tofofot oto. 
f3}o}2{7o]o] o| 


[6 | 2] 0 
Is OP e9f OF OF Of OF OF 0 | 
Oo 54] 5 | 3 p42] 3 2 8 | 1 | 
PR sf 10] 1 | OF to] 1 | OF OF 0 | 
RUSS | 2] oo; ofototogye2 | o | 
SOV. 47] 6 | 5 Oo] of of 4] 0 | 
pi2] 8st ot7 {oo 


OVERALL 1600-1914 1913-1939 1939-1945 


POJATOPTOTATOTOTAToToOT A 
| 0.40] 0.00} 0.23] 0.00] 
BR | 0.00) 

| 0.16] 0.00,N@RMH 0.29] 0.00) 0.05 


= 
° 
a 


1945-2000 
POTATO | 
| oO | 0 | 0 | 
| Oo | o | 0 | 
| o | 0 | 0 | 
PO} oO} 0 | 
fo | 0 | 0 | 








| 0.00 ROG} 0.00) na_|na_[na_| 


TRAINING ADVANTAGE ATTACKER LOSES 

OVERALL 1600-1913 1913-1939 1939-1945 
RowNames| OPAL OPOPALPOTOPAPoOPOPALD 
AUS | 11] Of 1 7] o;1P4]o}ofoto{o| 
BR 24; 4] 37] 4]ofis}otot4{o]s3| 


OVERALL 


COUNTRY] OT AT DT OTA] 
0.00] 0.06] 0.00) 0.00 
BR 0.00) 
0.00] 0.33 
ro.00fna_|na [na _| 


56 


1945-2000 
POT ALD 
| O | 0} 0 | 


=? 
ine) 





F. “MORALA” 
MORAL ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| O | ATOPOTATOPTOLATOPTOLALoOPTOLAD | 
AUS | 10] Of] OF 7] Oost} o};oftotot}otzot|o|o| 
BR (327; 56 Pov et i tor7 {4 opiw{ol;ofololol 
css A Tt To at i fT ofo}ototol;o}otolo)o | 
ENG [| 7{o;ot7{oftoyototofototofyototo. 
FR | 26] 9 | ft | Pit 4}ofoj;otofolo]o| 
GER | 35 | 7 | OF 8 | al 
IS 19 | 10 | 0 | 0 | 


| Oo | of ol] o | 
| oO | of ol] o | 
}i9{[ 17 0] 0 | 
Li4t opto] o| 


PR | 11 | 

RUSS | 2 | 0 | 0 | 

| O_ 
USA. 60 | 50] 0 


aw se 
— 


OVERALL 1600-1914 1939-1945 1945-2000 


ae zeae PO} AT OD 
| 0.45] 0.00RM@RMB} 0.25] 0.00f 0.00) | 0.24{ 0.00fna__[na__|na_| 
0.14] 0.00}mgmaa 0.06] 0.00] 
| 0.13] 0.0OR@B] 0.00] 0.00) 
| 0.34] 0.00) Ina__[na__[na__fna_|na_|na_ (DIB) 0.34] 0.00 





MORAL ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


SOMES ce a POPTALOPOPALTOPOPTAToOPOPATD 
A A tt to |} 4] o}ofototototo|o 


2 





ine) 4 4 


| 0 

on 

| 0 | 

| 2 | | 0 | 

USA 47] 14] 0 | 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


COUNTRY ee POTATOTOLA 
SA | 0.23] 0.00 | 0.14] 0.00 


| 0.06 0.00} 0.00] 
BRB. 0.03| 0.00FNGNSS] 0.07] 0.00 0.901 0.00 Pact 6.0 Ina__|na__|na_| 
0.00] 0.00fm@i@al 0.00] 0.08| 4fna_|na_|na_| 
na_jna__ na ae ne 





57 


G. “LOGSA” 
LOGISTICS ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLATOPTOPLTATOPTOPLALoOPTOPA Ld | 
AUS | OT it PT OT eT tT Tot stTot;ofot;otofoy|o| o | 
BRO {st} 4 eae ToT of to} ott | Oo | o | 0 | 


| 0.05| 0.00 ROG] 0.00] 0.00 
| 0.09] 0.05 MOG} 0.00] 0.00RNGR8H] 0.00] 0.09} 
| 0.09] 0.06 RN0895) 0.05] 0.00) 
[0.00] 0.00 [0.00] 0.00 





LOGISTICS ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 _ 1939-1945 _1945-2000 
RowNames| OT ATDPEOPTATDOPOPATOPOPALoOPO PAD | 
AUS | 11] Of 1 T8{[oftots{otifototofo|o|o| 


| 


le 


— 


wo 
Ea 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


OUNTRYP OT ATOPOTATOPTOLATOPToOTAToOToOTAd | 
| 0.00] 0.00 ROG} 0.00] 0.00) 
BRFSS} 0.06] 0.09} | 0.00] 0.15 


| 0.00} 0.00RN@R82] 0.00] 0.08] 





H. “MOMNTA” 
MOMENTUM ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


REUNEGES Mee — —— POTALTOTOLTATD | 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.33] 0.00} 30 | 0.16] 0.00 RM] 0.42] 0.00fna__[na__[na_| 
| 0.20] 0.00 ; | 0.45] 0.00RNORSS] 0.12] 0.00fna__|na__[na_| 
| 0.36] 0.00 ; | 0.21} 0.00 | 0.00fna__|na_[na_| 
[0.45] 0.00 [0.45] 0.00 





MOMENTUM ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OPAL DP OPALPOLTOPALPOPOPALPOPOPALD 
AUS |] 12]; Oo] OF 8B} O| OF 4] OT oTo};of;ofototl do 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.06] 0.00 ROB} 0.00] 0.00) 
| 0.15] 0.00 FGI] 0.07] 0.00FNORGS] 0.31] 0.00 


| 0.24) 0.00RMROG] 0.00] 0.00RNONGM 0.33] 0.00) ; 





I. “INTELA” 
INTELLIGENCE ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOPLATOPTOPLATOPTOPLALToOTOPA LD | 
AUS | BT AT OT Bat 4 tT oF Bt] of ofot;ol;ofol oo | 
BR [307 ST 2Petiltegreslte2yvops| 2yofololoal 
cS__| 3 {2 { 0) 3] 2{;ofotoftofotolo 


| 13 | 0 | 
| 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


PA} D 
| 0.00) | 0.00] 0.02} 
| 0.18] 0.00 fNORSS] 0.12] 0.00) 
| 0.21| 0.00RNORBAT 0.36] 0.00] 0.40] 0.40] 0.20) 


|_0.07} 0.00} 


| 0 | 
| 0 
| 0 
| 0 
RoW 
| 0 
| 0 





i) 





INTELLIGENCE ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OP AT OPO PAPO LTOPAPO POLAT OP OPALD 
AUS | BL 7] SPOT TT tPA} ot; e2to} of} ofo}l ol ol 


— 1914 —— 1939 —— 1945 — 2000 


DOERE roca a0 TO ED 7 aaa 
OLCONORES [0.00] 0.29)mia@0] 0.00] 0.000886] 0.00[ 0.14na [na _|na_| 
| 0.00] 0.33 0.13fna_|na_|na_| 
[0.00] 0.14 [0.00] 0.14 





60 


J. “TECHNA” 
TECHNOLOGY ADVANTAGE ATTACKER WINS 


OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 


RowNames| OT ATOPOLTATOPTOLATOPTOLALoOTOLAD | 
AUS | 6 |] 4A] OF BS] AT OF S| oO} otol;ot}ototlo| ol 
BR [247 ist of 7] 27yoTs ite tvopa| stofolofol 
cs__| 3 {2 { 0) P33] 2{;ofototofo}oftofol oo. 
ENG [| 5] 2; o7s5]2]ofot;oftofoto}toyot oto. 
FR os pat} io} of 4] 1}ofoto}ofo]o]o| 


GER 22 | 20] 0 | a ea 
IS 16 | 13] OF 0 | | oO | 0 | 
OO 36 | 26 | 0 
PR =f 8B] 3] OF 8 | | oO | 0 | 
RUSS {| 0 | 2 | 0 f 0 | | oO | 2 | 
SOV. | 16 | 6 | Of 0 | | Oo | 1 | 
USA 88 | 22] of 16] 4] 0 


OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 


| 0.20] 0.00FN@R8O] 0.20] 0.00 

| 0.34) 0.00RNORMS] 0.25] 0.00) 

| 0.43] 0.00RNORM8] 0.21] 0.00) 
[0.45] 0.00 


= 
oOo 








TECHNOLOGY ADVANTAGE ATTACKER LOSES 
OVERALL 1600-1913 1913-1939 1939-1945 1945-2000 
RowNames| OP AT DP OPALPOLTOPALPOPOPALPOPTOPALD 
AUS] 12] Oo] OF 8B} O| OF 4] OL oTo};of;ofotol da 





OVERALL 1600-1914 1913-1939 1939-1945 1945-2000 





| 0.00] 
[0.00[-0.00fna__Jna [na _| Ina__[na_|na (NO 0.00] 0.00 


61 


“INITA” 


K. 


INITIATIVE ADVANTAGE ATTACKER WINS 


——— 


1913-1939 1939-1945 


-1914 


1600 


OVERALL 





> 
ao 
ke 
Zz 
2D 
Oo 
O 


1939-1945 1945-2000 


1913-1939 


1600-1914 


OVERALL 





(0) 


INITIATIVE ADVANTAGE ATTACKER LOSES 


——— 


—— 945 


1913-1939 


——— 


OVERALL 


< 
> 
oc 
Zz 
2D 
Oo 
O 





1939-1945 1945-2000 


1913-1939 


1600-1914 


(0) 





; 


62 


APPENDIX B. BOXPLOTS OF OBJECTIVE VARIABLES 


This section has the boxplots that are not listed for the objective variables 
analyzed in the second chapter. The reader may refer to p.15 for more explanation on the 


boxplots. 


A. FORCE RATIO 





Force Ratio Before WW1 
0 2 4 6 8 




















USA e e ° 























































































































forceRatio 








63 





Force Ratio During WW1 





























































































































| | | | | | | 
' 
USA e 0 e lonelommmne) 
GER fe} e Oo fe) 
BR e e ° 
T T T T T T T T T T T T T T 
2 4 6 8 10 12 14 
forceRatio 





64 








USA 


GER 


BR 


Force Ratio During WW2 






































































































































0 10 15 
| | | 
e ie) ie) @) 

e 

—e 
T T T T T 

10 15 
forceRatio 





65 





B. 


ARTILLERY RATIO 





USA 


GER 


BR 


Artillery Ratio Before WW1 






































































































































0 2 
0 : : : | 
arty 





66 











USA 


GER 


BR 


Artillery Ratio During WW1 


0 10 





! ! 






























































20 


30 





arty 





67 











USA 


GER 


BR 

















Artillery Ratio During WW2 


















































0 20 40 60 80 
| | - | | | 
e} ooo oo ie) 
(@) e oO 
T T T T T T T T T 
20 40 60 80 
arty 





68 








c. AIR FORCE RATIO 
Airforce Ratio During WW2 
0 200 400 600 800 
1 a 1 
USA | @ ° ° fe) ° 


GER 


BR 



























































T TT T 
600 800 





69 





THIS PAGE INTENTIONALLY LEFT BLANK 


70 


APPENDIX C. ACRONYMS 


COUNTRY NAMES 
AUS: Austria 


ENG: England 

BR: Britain 

GER: Germany 

PRUSS: _ Prussia 

IS: Israel 

USA: United States of America 
SOV: USSR 


RUSS: Russia 


CS: Confederate States (Present only in the battles of American Civil War) 
TU: Turkey 

EG: Egypt 

SYR: Syria 


71 


THIS PAGE INTENTIONALLY LEFT BLANK 


ap: 


10. 


11. 


12. 


13 


LIST OF REFERENCES 


Hartley, Dean S., “Topics in Operations Research: Predicting Combat Effects,” 
Military Applications Society of INFORMS, 2001. 


Dupuy, Col. T. N. (U. S. Army, Ret.), “Numbers, Predictions and War,” Hero 
Books, 1985. 


Yigit, Faruk, “Finding the Important Factors in Battle Outcomes: A Statistical 
Exploration of Data from Major Battles,” Master’s Thesis, Naval Postgraduate 
School, Monterey, California, 2000. 


Coban, Muzaffer, “Predicting Battle Outcomes with Classification Trees,” 
Master’s Thesis, Naval Postgraduate School, Monterey, California, 2001. 


Personal Communication from Professor Thomas Lucas, Operational Research 
Department, Naval Postgraduate School, Monterey, California. 


Slate Magazine, MSN, [http://slate.msn.com], February 26, 2003. 


Chambers, John M. and Hastie, Trevor J., “Statistical Models in S,”’ Wadsworth 
& Brooks/Cole Advanced Books & Software, 1992. 


Devore, Jay L., “Probability and Statistics for Engineering and Sciences,” 
Duxbury, 2000. 


Venables, W. N. and Ripley, B. D., “Modern Applied Statistics with S-PLUS,” 
Springer, 1999. 


Therneau, Terry M. and Atkinson, Elizabeth J., “An Introduction to Recursive 
Partitioning Using the RPART Routines,” Mayo Foundation, September 3, 1997. 


FM100-5 OPERATIONS, [http://usasma.bliss.army.mil/Pubs/FM_100- 
5/FM_100-5.pdf], May 8, 2003. 





Class Notes OA 3103 Samuel E. Buttrey, Operational Research Department, 
Naval Postgraduate School, Monterey, California. 


S-Plus 4, Guide to Statistics, Data Analysis Products Division MathSoft, Inc., 
Seattle, Washington. 


73 


THIS PAGE INTENTIONALLY LEFT BLANK 


74 


INITIAL DISTRIBUTION LIST 


Defense Technical Information Center 
Ft. Belvoir, Virginia 


Dudley Knox Library 
Naval Postgraduate School 
Monterey, California 


Thomas W. Lucas 
Naval Postgraduate School 
Monterey, California 


Samuel E. Buttrey 
Naval Postgraduate School 
Monterey, California 


Ali Cakan 


Kara Kuvvetleri Personel Baskanligi 
Ankara, Turkey 


is 


